Bash Regular Expression Cheatsheet

grep vs grep -E

The difference between grep and grep -E is that grep uses basic regular expressions while grep -E uses extended regular expressions. In basic regular expressions, the characters “?”, “+”, “{”, “|”, “(“,”)” lose their special meaning; instead, use “?”, “+”, “{”, “|”, “(”, “)”.

$ echo "987-123-4567" | grep "^[0-9]\{3\}-[0-9]\{3\}-[0-9]\{4\}$"
$ echo "987-123-4567" | grep -E "^[0-9]{3}-[0-9]{3}-[0-9]{4}$"

tr Substitutes Strings

# tr substitutes white spaces to newline
$ echo "a b c" | tr "[:space:]+" "\n"
a
b
c

# tr spueeze multiple spaces
$ echo "a    b   c" | tr -s " "
a b c

uniq Filters out Repeated Lines

$ echo "a a b b c" | tr " " "\n" | sort | uniq
a
b
c

# display count
$ echo "a a b b a c" | tr " " "\n" | sort | uniq -c
   3 a
   2 b
   1 c

Note that uniq only filters out lines continuously. However, if characters are equal but they does not appear continually, uniq does not squeeze them. Therefore, a programmer needs to use sort to categorizes lines before uniq.

$ echo "a a b b a c" | tr " " "\n" | uniq
a
b
a
c

sort lines

# sort by lines
$ echo "b a c d" | tr " " "\n" | sort

# sort by lines reversely
$ echo "b a c d" | tr " " "\n" | sort -r

# sort by a field
$ echo "b a b c d" | tr " " "\n" | sort | uniq -c | sort -k1