Regular expression in Python

I was writing scripts for parsing a document using Python. I found it hard to find a good and quick tutorial on python regular expression package. This is my best effort of showing how to use Python regular expression.

The first thing you need to do is import the regular package from python

import re

To build an regular expression object, you need to invoke a compile call to re. Here, I am trying to find expression in the form of Rice University net id, such as “yz17” (2-3 alphabetical letters followed by 2-3 digits number).

netidExpr = re.compile(' [a-z]*[0-9]* ')

Once you have a regular expression object, there are a few useful methods you can call to find the matching pattern, the following on official documentation in the python webpage

  • The first one is
    • documentation: Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
  • The second method is re.match()
    • If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObjectinstance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

For me, search() is a lot more useful as I am trying to extract the pattern from a line of text. But I can see where match() can come in handy as well. I will demonstrate how to use search() in this article

Assuming I have a textLine, ” Yunming Zhang, ( yz17) Active Student”, the way I would write a line of code to extract yz17 is the following

import re

netidExpr = re.compile(' [a-z]*[0-9]* ')

netid =" Yunming Zhang, (  yz17 ) Active Student")


I use to read out the actual match. The search method returns a match object. The above code would return “yz17”

Hope this helps!

As a side note, apparently you can also split with multiple delimiters. Even though this might be a good case to consider using a regular expression instead. But this stack overflow post summarized it all very well,

This entry was posted in Python, Tools. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s