How to on regular expressions?
-
References:
Guide for basic & extended regular expressions - https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
Man page for pcre - https://www.lightnetics.com/post/3347Note: A variety of tools use regular expressions, be careful and test them on sample files first, different tools may give different results, when using them for modifying or deleting, you want to ensure predictable results.
Basic regular expressions.
The Anchor -
The basics, the anchor determines the position in the line.
Referred to as caret, not the eating variety. The caret is the beginning of the line. (start anchor)
^
The dollar is the end of the line. (end anchor)
$
Two sample files are used in these examples, the grepfile and the numbersfile
$ cat grepfile We're going to need a bigger boat Go ahead make my day The brothers in arms The rain in spain stays mainly on the plain
$ cat numbersfile 56789 23653 457
Search for W at beginning of line.
$ grep ^W grepfile
We're going to need a bigger boat
Search for letters ay at the end of the line.
$ grep ay$ grepfile
Go ahead make my day
Seach for word the in file.
$ grep the grepfile
Displays the wrong output.
The brothers in arms
The rain in spain stays mainly on the plainTo get just the word the, put spaces around the word, using singel quotes to preserve spaces in grep.
$ grep ' the ' grepfile
The rain in spain stays mainly on the plain
The dot matches any character.
.
Match any word with a double e letter in it.
$ grep .ee grepfile
We're going to need a bigger boat
The open/close square brackets matches match ranges. Works for numbers and letters. The range has to be from low to high.
[...]
Search for any letters between u and z in the file.
$ grep [u-z] grepfile
Go ahead make my day
The rain in spain stays mainly on the plainThe caret as the first character in square brackets is an exception. Exclude all characters in the square brackets.
[^...]
Exclude characters a through z and characters single quote, G and T. The single quote has to be escaped with a backslash, because it's a special character to the shell. Everything character was excluded but the W
$ grep [^a-z\'GT] grepfile
We're going to need a bigger boat
Go ahead make my day
The brothers in arms
The rain in spain stays mainly on the plainTo match zero or more digits. Notice it matches the "zero or more", it include the line with zero matching lines.
$ grep [0-9]* numbersfile
56789
23653
457
To match one or more digits. Repeat the character set, notice the blank lines are no longer show.
$ grep [0-9][0-9]* numbersfile
56789
23653
457The curly brackets escaped with a blackslash, match the number of character sets.
\{n,n\}
Match the character set a through z with 4, 5 ,6 7, or 8 characters.
$ grep '[a-z]\{4,8\}' grepfile
Match only one uppercase T at the beginning of the line..
$ grep '^T\{1\}' grepfile
The left arrow and right arrow escaped with a blackslash, matches words
\<...\>
Match all "The or the" words in a file.
$ grep '\<[tT]he\>' grepfile
The the round brackets escaped with a blackslash and \1 to remember patterns.
\(...\)
Match only the word level, using the remember pattern.
$ echo test level pop | grep '\([a-z]\)\([a-z]\)[a-z]\2\1'
test level pop
Extended regular expressions.
Extended regular expressions the {, }, <, >, (, ), and \digit, have no special meaning, they do not use the backslash.
The following examples are just demonstrate what the characters do in regular expressions.
The dot matches any single character.
The dot print the entire file because it matches any single character. See the how the match is all in red.
The asterisk matches zero or more single character that precedes it.
Here we match the letter "a" and anything after it.
The caret matches the regular expression that follows it at the beginning of the line
Here we show that by just using "^" on its own prints the entire file because each line has a beginning of line.
If we add a letter to it, like G, it matches the any line beginning with the letter G.
The dollar matches the regular expression that precedes it at the end of the line.
To demonstrate every line has a end of line character.
Print the any line which has a "y" at the end if it.
The [...] matches a range of characters.
Here we match the "a" and "i" characters in the file.
Here we match any characters from a through d.
Here we match any characters that are not (using the caret) a through d.
? matches zero or more
instances of preceding regular expression.
Here we match any word that has "a" or "i" followed by an "n", because it's zero or more, if there was a string aiin or aain it would also match these.
+ matches one or more instances of preceding regular expression.
Notice the difference between the ? and +
| matches the regular expression specified before or after.
Here we match one or more "a" or "i" followed an "n" or anything with "go".
© Lightnetics 2024