How to on regular expressions?



  • References:
    Guide for basic & extended regular expressions - https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
    Man page for pcre - https://www.lightnetics.com/post/3347

    Note: A variety of tools use regular expressions, be careful and test them on sample files first, different tools may give different results, when using them for modifying or deleting, you want to ensure predictable results.

    Basic regular expressions.


    The Anchor -

    The basics, the anchor determines the position in the line.

    Referred to as caret, not the eating variety. The caret is the beginning of the line. (start anchor)

    ^

    The dollar is the end of the line. (end anchor)

    $

    Two sample files are used in these examples, the grepfile and the numbersfile

    $ cat grepfile
    We're going to need a bigger boat
    Go ahead make my day
    The brothers in arms
    The rain in spain stays mainly on the plain
    
    $ cat numbersfile 
    56789
    
    23653
    
    457
    

    Search for W at beginning of line.

    $ grep ^W grepfile
    

    We're going to need a bigger boat

    Search for letters ay at the end of the line.

    $ grep ay$ grepfile
    

    Go ahead make my day

    Seach for word the in file.

    $ grep the grepfile
    

    Displays the wrong output.
    The brothers in arms
    The rain in spain stays mainly on the plain

    To get just the word the, put spaces around the word, using singel quotes to preserve spaces in grep.

    $ grep ' the ' grepfile
    

    The rain in spain stays mainly on the plain

    The dot matches any character.

    .

    Match any word with a double e letter in it.

    $ grep .ee grepfile
    

    We're going to need a bigger boat

    The open/close square brackets matches match ranges. Works for numbers and letters. The range has to be from low to high.

    [...]

    Search for any letters between u and z in the file.

    $ grep [u-z] grepfile
    

    Go ahead make my day
    The rain in spain stays mainly on the plain

    The caret as the first character in square brackets is an exception. Exclude all characters in the square brackets.

    [^...]

    Exclude characters a through z and characters single quote, G and T. The single quote has to be escaped with a backslash, because it's a special character to the shell. Everything character was excluded but the W

    $ grep [^a-z\'GT] grepfile
    

    We're going to need a bigger boat
    Go ahead make my day
    The brothers in arms
    The rain in spain stays mainly on the plain

    To match zero or more digits. Notice it matches the "zero or more", it include the line with zero matching lines.

    $ grep [0-9]* numbersfile 
    

    56789

    23653

    457

    To match one or more digits. Repeat the character set, notice the blank lines are no longer show.

    $ grep [0-9][0-9]* numbersfile 
    

    56789
    23653
    457

    The curly brackets escaped with a blackslash, match the number of character sets.

    \{n,n\}

    Match the character set a through z with 4, 5 ,6 7, or 8 characters.

    $ grep '[a-z]\{4,8\}' grepfile
    

    Match only one uppercase T at the beginning of the line..

    $ grep '^T\{1\}' grepfile
    

    The left arrow and right arrow escaped with a blackslash, matches words

    \<...\>

    Match all "The or the" words in a file.

    $ grep '\<[tT]he\>' grepfile
    

    The the round brackets escaped with a blackslash and \1 to remember patterns.

    \(...\)

    Match only the word level, using the remember pattern.

    $ echo test level pop | grep '\([a-z]\)\([a-z]\)[a-z]\2\1'
    

    test level pop

    Extended regular expressions.


    Extended regular expressions the {, }, <, >, (, ), and \digit, have no special meaning, they do not use the backslash.

    The following examples are just demonstrate what the characters do in regular expressions.

    The dot matches any single character.

    The dot print the entire file because it matches any single character. See the how the match is all in red.

    0_1511781480473_1redot.png

    The asterisk matches zero or more single character that precedes it.

    Here we match the letter "a" and anything after it.

    0_1511782300015_reasterisk2.png

    The caret matches the regular expression that follows it at the beginning of the line

    Here we show that by just using "^" on its own prints the entire file because each line has a beginning of line.

    0_1511782632887_4recaret.png

    If we add a letter to it, like G, it matches the any line beginning with the letter G.

    0_1511782750788_recaretg.png

    The dollar matches the regular expression that precedes it at the end of the line.

    To demonstrate every line has a end of line character.

    0_1511782930179_3redollar.png

    Print the any line which has a "y" at the end if it.

    0_1511783084278_reydollar.png

    The [...] matches a range of characters.

    Here we match the "a" and "i" characters in the file.

    0_1511783219474_5rerange.png

    Here we match any characters from a through d.

    0_1511783257848_6rerange2.png

    Here we match any characters that are not (using the caret) a through d.

    0_1511783345344_9renegrange.png

    ? matches zero or more

    instances of preceding regular expression.

    Here we match any word that has "a" or "i" followed by an "n", because it's zero or more, if there was a string aiin or aain it would also match these.

    0_1511784352231_requestionmark.png

    + matches one or more instances of preceding regular expression.

    Notice the difference between the ? and +

    0_1511784695809_replus.png

    | matches the regular expression specified before or after.

    Here we match one or more "a" or "i" followed an "n" or anything with "go".

    0_1511785144662_12reor.png


Log in to reply
 

© Lightnetics 2024