Regular Expressions Using Egrep
Egrep
Background
So, I have been studying some regular expressions lately and I thought I would share some of my discoveries with you.
Regular Expressions do not appear to make any sense when you first encounter them. However, after a bit of use they begin to make more sense and you can discover that these expression can allow you to solve very unique and complex problems.
Therefore, I am planning on demonstrating some basics in this post.
Since we are going to search the dictionary as an example, please download words.txt from the following source:
https://github.com/dwyl/english-words
This has about 355,000 words that we can search for.
egrep – Intro
egrep or Grep is a program that is designed to match a specified pattern within a file. So, the specified pattern can be a regular expression.
The program is available on Macs and Linux distributions with no installation necessary.
Here’s how to use the program
$ egrep 'regular expression' filename
For our example the filename would be words.txt.
First Regular Expression
I was playing scrabble with my friend and he wanted to know all the words that had double vv. As soon as he said this I immediately knew how to determine the answer fairly quickly using egrep.
$ egrep vv words.txt
This will give you all the words that have a double vv within the word. However, there is a problem with this. This search only found words that have two v’s side by side.
This is where regular expressions become very powerful. The regular expression to match any word with two or more v’s within a word can be done with the following expression:
$ egrep v.*v words.txt
So let’s break down this expression further. The first v will find any v character and produce a match. The second character is a . (dot or period) which means select any character. Third character is a * which means match zero or more repetitions. So the combination of .* is a catch all matching pattern and the final v is to match any further v’s.