Getty Images/iStockphoto
How to use regex in PowerShell
A regular expression is a series of characters that determine a matching pattern in text to find or replace input validation. Walk through a regex example in PowerShell.
Trying to find specific information within text or input can be a nightmare. Luckily, there is a way to simplify this process.
Regular expressions (regex) consist of a sequence of characters that collectively define a pattern that is to be matched. For example, regular expressions are commonly used as a means for validating input or for locating specific information within a long string of text. Regular expressions can also be used as a means of performing string manipulation.
Regex is not PowerShell-specific. Most modern programming languages natively support the use of regular expressions.
Regular expressions can be complex. A comprehensive discussion of regular expression syntax is beyond the scope of this article, but some of the most commonly used elements are the following:
- An asterisk -- * -- is is a wildcard; it represents any individual character.
- Brackets -- [] -- indicate that a character must match the characters enclosed in brackets. For example, [abc] indicates that the character must be A, B or C.
- A caret symbol within brackets -- [^] -- works opposite from normal bracketed characters. Rather than declaring that a character must match those characters appearing in brackets, the caret indicates that a character cannot match any of the bracketed characters. If you were validating input and wanted to make sure that the letters A, B or C were not entered, you could use [^abc].
The examples of ways you can use regular expressions are simple and a little silly, but you can apply the approach as you need to.
Data extraction
In a PowerShell script, regular expressions enable you to locate and extract data. For example, you might create a script that locates specific data within a log file or one that extracts information from a webpage.
To show how you can use regular expressions for data extraction, I created a text file called SampleParagraph.txt. That text file contains my name and contact information, as well as the first few paragraphs of this article -- as I would deliver it to the editor.
It's simple to create a PowerShell script that reads a text file and looks for a string -- in this case, an email address -- within that file. You don't even need to use regular expressions to accomplish such a task. You could locate a specific email address by using this command:
Select-String -Path SampleParagraph.txt -Pattern '<email address>'
If you look at Figure 1, you can see that this command has located my email address in the second line of the file.
You do not need regular expressions to locate data within a text file; they become useful in situations where you don't know the exact text that you need to extract. I could use regex if the sample file contained an email address and I needed to find it but I didn't know what the email address was. In that situation, using the previous command doesn't work because there is no literal value -- a specific email address -- that I can search for.
Email addresses adhere to a specific format. They contain a bit of text, an @ sign, another bit of text, a period and a top-tier domain name. This standardized formatting makes it possible to locate an email address within a file, even if we don't know exactly what the address is. Here is a command that you can use:
Select-String -Path SampleParagraph.txt -Pattern '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'
As you can see, the first portion of the command is identical to what I used before. The difference is that, rather than searching for a literal value, I am searching for a pattern. Although the pattern looks cryptic, it has meaning.
The \b that comes at the beginning of the pattern tells PowerShell that the match should occur at a word boundary. This essentially means that any matches should happen at the beginning or end of a word.
The next part of the pattern, [A-Za-z0-9._%+-]+, tells PowerShell that the first part of the email address can contain uppercase or lowercase letters, numbers or any of a few different symbols. As you may recall, this type of pattern matching normally applies to a single character. However, the + sign at the end of this part of the expression indicates that a match can contain one or more characters.
After that, the pattern includes an @ symbol, which corresponds to the @ sign within the email address, and then another pattern that is identical to the one used for the first part of the address.
The last part of the pattern is \.[A-Za-z]{2,}\b'. This tells PowerShell that the text string it is searching for -- the email address -- should end with a period, two or more characters, and a word break. The reason why we are searching for two or more characters is that some email addresses use three-character top-tier domain names, such as .com, .gov or .edu, but others use two characters, such as .ca or .br. Telling PowerShell to look for at least two characters enables the search pattern to find email addresses with either type of top-tier domain name.
Input validation
You can also use regular expressions as a tool for validating input. Suppose that you created a PowerShell script that asks the user to enter an email address. You could use pattern matching of regular expressions as a way of determining whether the user's input adheres to the format used by email addresses. Here is what such a script might look like:
$Email=Read-Host "Please enter an email address"
If ($Email -Match '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}')
{
Write-Host "This looks like an email address"
}
Else {
Write-Host "Invalid Email Address"
}
String manipulation
Just as you can use regular expressions to validate strings, you can also use them to manipulate them. There are countless uses for string manipulation. Consider how you might use string manipulation in conjunction with input validation. In some cases, if you detect invalid input, you might be able to use string manipulation to automatically correct it.
For example, my first name has a somewhat unusual spelling: Brien, rather than Brian. As you can imagine, my name gets misspelled a lot. The two most common misspellings are Brian and Brain. Here's a simple PowerShell script that checks for these two misspellings and corrects them:
$Name=Read-Host "Please type Posey's first name"
If ($Name -Match "Br[ia][ia]n")
{
$Name='Brien'
Write-Host "The name's spelling has been corrected to " $Name
}
Else
{
Write-Host "The name was spelled correctly"
}
This script asks the user to type my first name, and the text that they entered is stored in a variable called $Name. The script then performs pattern matching to detect invalid characters in the third and fourth positions. Specifically, the script looks for the letters I and A in positions where they do not belong. If a misspelling is detected, it is corrected automatically.
This approach could be useful in a situation where a user must enter a product number into a PowerShell script. If you know that the product number always starts with the letter P, you could validate the input and then use the technique that I just demonstrated to replace an invalid starting character if necessary.
Brien Posey is a 22-time Microsoft MVP and a commercial astronaut candidate. In his over 30 years in IT, he has served as a lead network engineer for the U.S. Department of Defense and as a network administrator for some of the largest insurance companies in America.