What are the commonly used regular expression characters?



  • Background information: https://en.wikipedia.org/wiki/Regular_expression

    BRE - Basis Regular Expressions.
    ERE - Extended Regular Expressions.
    PCRE - Perl Compatible Regular Expressions.

    Basics.


    Alphanumeric characters/Literals. A-Z,a-z,0-9

    The . (dot) matches any single character. .

    The square brackets match a list of characters. [abcd] this matches a, b, c, or d

    The POSIX are way of doing things. For example [:alnum:] is the same as [A-Za-z0-9]. For other POSIX character classes and a good comparison see this great wikipedia link: https://en.wikipedia.org/wiki/Regular_expression

    Match at the beginning of the line. ^

    Match at the end of the line. $

    Repitition.


    The ? matches the most preceding character zero or one occurrences.

    The * in regex matches the preceding character zero or more occurrences.

    The + in regex matches the character preceding it one or more occurrences.

    {n} Matches the preceding character exactly n times.

    {n,} Matches the preceding character one or more times.

    {,m} Matches the preceding character no more than m times.

    {n,m} Matches the precending character at least n times but not more than m times.

    Metacharacter Modifiers.


    \< matches empty string at the beginning of a word.

    \> matches empty string at the end of a word.

    ^ negates a list of characters in a character class.

    | In regular expression this means a logical OR.

    ( and ) Allow a specific sequence of pattern comparison.

    Shorthand character classes.


    Read the documentation on your regular expression command to see what is supported and not supported. e.g. \d is support by Perl Compatible Regular Expression Mode, but not POSIX or GNU grep.

    \d is short for [0-9]

    \w word character, is short for [A-Za-z0-9_]]

    \s whitespace character is short for [ \t\r\n\f] (space, tab, line break, form feed)

    \d is short for [0-9]

    The above shorthand classes have negated character classes.

    \W mean logical NOT \w

    \S mean logical NOT \s

    \D mean logical NOT \d



© Lightnetics 2024