Getty Images

Discover advanced regular expressions in PowerShell

Learn how to develop more sophisticated regular expression techniques by understanding how to work with lookarounds with different operators.

Information is power, and knowing how to sift through data quickly is a vital IT skill.

The combination of PowerShell knowledge and regular expression (regex) syntax can elevate your skills and deftly handle logs and configuration files to find what you need fast. PowerShell has no limit for regular expression complexity, but for more sophisticated techniques, you have to understand how to work with all available syntax. The following examples show you how to go beyond basic regular expression construction and undertake more complex tasks when working with text in your automation efforts.

How to work with lookaheads and lookbehinds

In regular expressions, lookaheads and lookbehinds -- collectively referred to as lookarounds -- confirm characters before or after a match without including that text as part of the match.

For example, when using a regular expression to search a PowerShell script to find all variable names without matching on the dollar sign, you use a lookbehind to tell the regular expression that there must be a dollar sign in front of the string but not to include it in the match.

The lookaround syntax is straightforward:

  • Positive lookahead: (?=regex)
  • Negative lookahead: (?!regex)
  • Positive lookbehind: (?<=regex)
  • Negative lookbehind: (?<!regex)

Positive and negative lookarounds are used to match or not match a pattern based on the surrounding text. A positive lookaround matches if the pattern is followed by a certain string of characters. A negative lookaround checks for a pattern that is not followed by a particular character sequence.

The following regular expression builds on the earlier example to match variables in a PowerShell script:

'(?<=\$)\w+'

This example uses a positive lookbehind, and the regular expression inside the lookbehind is \$. Since the dollar sign is a special character in regular expressions, you need to use a backslash for the escape, which means you want to match the literal character, not its special meaning in the context of regular expressions.

To use this on a PowerShell script, pipe its contents to the Select-String cmdlet:

Get-Content 'Curl2PS.psm1' | Select-String -Pattern '(?<=\$)\w+' -AllMatches
positive lookbehind
Use a positive lookbehind in a PowerShell regular expression to find the variables without matching on the dollar sign in the variable name.

How to use the -replace operator

There's a good reason why I wouldn't just match on the dollar sign and skip the confusing lookbehind. One of the powerful uses for regular expressions is to replace text. Lookbehinds help control the match when using groups in a regex replace expression to rearrange data.

For example, if you wrote a script but then learned it was not ideal to use $null on the right side of a conditional statement, then you will need to swap $null from the right side to the left side of the conditional statements. You can write a regular expression to match comparisons that use $null on the right side on the -eq operator:

'(?<=(if|while) ?\()(?<exp>.*) -eq \$null(?=\))'

Here's a breakdown of the regular expression:

  • (?<=(if|while) ?\() -- This checks for an if or while statement followed by a space or not and then an opening parenthesis. Regular expressions are case-sensitive, so this would only match on lowercase if and while statements.
  • (?<exp>.*) -eq \$null -- The .* matches any character sequence and assigns it to the named group. That is followed by the -eq operator and then $null. It is complicated to write a regular expression to account for any possible expression on the left side of the conditional statement, but it would be possible via PowerShell Abstract Syntax Tree instead of a regular expression.
  • (?=\)) -- This is a positive lookahead that checks for the closing parenthesis.

Understanding the reason for lookarounds in this context requires learning how to work with the -replace operator. You use -replace in PowerShell to swap text based on a regular expression. You don't have to worry about replacing the if, while or parenthesis since the lookarounds contain them. Focus on the matched conditional statements.

The last piece of this puzzle is to understand how to use groups in -replace. You can reference each group in a regular expression by using its name or index. I recommend naming your groups because anything beyond a simple regular expression replacement makes it confusing to remember the order of the matches. In this case, I have one group called exp, which you reference with ${exp}.

The following example uses a simple if-else statement for the string:

$str = @'
if ($RealName -eq $null) {
    $FakeName
} elseif ($FakeName -eq $null) {
    $RealName
}
'@

Next, use -replace to exchange the conditional statements with their opposite:

$str -replace '(?<=(if|while) ?\()(?<exp>.*) -eq \$null(?=\))','$null -eq ${exp}'
-replace operator swap
The regular expression uses the -replace operator to swap the conditional statements in the code without altering the rest of the script.

The output shows the swapped conditional statements. By using lookarounds, the if statements stayed intact.

How to work with the -split operator

Another place that lookarounds come in handy is with the -split operator, which separates a string based on a regular expression. If you use a lookaround, you then can match on your pattern without splitting on too many characters.

To use a phone number example, take our normalized phone numbers, and split them into the area code and phone number. This means splitting them on the hyphen but only on the first hyphen that is preceded by a closing parenthesis.

For reference, here is the phone number format:

'(123)-456-7890'

This regular expression matches on a hyphen preceded by a closing parenthesis.

'(?<=\))-'

(?<=\)) is the lookbehind to check for a closing parenthesis, and - matches on the hyphen.

The following splits on the hyphen without also splitting on the closing parenthesis:

'(123)-456-7890' -split '(?<=\))-'

Advanced regular expressions give many ways to solve a problem

With lookarounds, there is usually an alternative that works and may be less confusing. For example, if you need to match on the value of a string between quotes, there are two easy ways. You can use a negative character set or match anything followed by a quote.

Here's our example string:

'Hello, my name is "Anthony" and I am the author.'

Two regular expressions would work here:

'"(?<name>[^"]+)"'
'(?<=").+(?=")'

Both deliver the desired data. You could use a combination for a better regular expression:

'(?<=")(?<name>[^"]+)(?=")'

This returns the name between the quotes as shown in the screenshot.

The takeaway is there is no single right way to construct a regular expression.

It helps to know your way around lookarounds

When you expand your expertise with regular expression syntax, you can tackle a wide range of issues in your daily administrative duties.

While lookarounds may seem a bit esoteric, they do have their place in regular expressions, especially when it comes to working with the -replace and -split operators. If you use regular expressions enough, there comes a time when knowing how to work with a lookaround comes in handy.

Dig Deeper on IT operations and infrastructure management