Learning to Use Regular Expressions by Example

Dario F. Gomes

This site I'm working on relies heavily on user input through forms, and all that data needs to be checked before being sent a database. I knew PHP3's regular expression functions should solve my problem, but I didn't know how to form the regular expressions in the first place. What I needed were some sample strings--obviously the first places I looked were the PHP3 manual and the POSIX 1002.3 specification, but they don't help much in the way of exemplifying.
Adding to that, I had a really hard time finding good literature on the Web about the subject. I eventually got to know how to do it, mostly through experimenting, and seeing there wasn't much to it, I decided to write down this straight-out introduction to the syntax and a step-by-step on building regular expressions to validate money and e-mail address strings. I just hope it manages to clear the fog around the subject for all you fellow programmers.


Basic Syntax of Regular Expressions

First of all, let's take a look at two special symbols: '^' and '$'. What they do is indicate the start and the end of a string, respectively, like this:

  • "^The": matches any string that starts with
    "The";

  • "of despair$": matches a string that ends in the
    substring "of despair";

  • "^abc$": a string that starts and ends with "abc"
    -- that could only be "abc" itself!

  • "notice": a string that has the text "notice"
    in it.

you don't use either of the two characters we mentioned, as in the last example, you're saying that the pattern may occur anywhere inside the string -- you're not "hooking" it to any of the edges.

There are also the symbols '*', '+', and '?', which denote the number of times a character or a sequence of characters may occur. What they mean is:
"zero or more", "one or more", and "zero or one."

Here are some examples:

  • "ab*": matches a string that has an a followed
    by zero or more b's ("a", "ab", "abbb",
    etc.);

  • "ab+": same, but there's at least one b ("ab",
    "abbb", etc.);

  • "ab?": there might be a b or not;
  • "a?b+$": a possible a followed by one or more b's ending a string.

>bounds, which come inside braces and indicateranges in the number of occurences:

  • "ab{2}": matches a string that has an a followed
    by exactly two b's ("abb");

  • "ab{2,}": there are at least two b's ("abb",
    "abbbb", etc.);

  • "ab{3,5}": from three to five b's ("abbb",
    "abbbb", or "abbbbb").