Learning to Use Regular Expressions by Example

Validating E-mail Addresses

Ok, let's take on e-mail addresses. There are three parts in an e-mail
address: the POP3 user name (everything to the left of the '@'), the '@', and
the server name (the rest). The user name may contain upper or lowercase
letters, digits, periods ('.'), minus signs ('-'), and underscore signs ('_').
That's also the case for the server name, except for underscore signs, which may
not occur.

Now, you can't start or end a user name with a period, it doesn't seem
reasonable. The same goes for the domain name. And you can't have two
consecutive periods, there should be at least one other character between them.
Let's see how we would write an expression to validate the user name part:

^[_a-zA-Z0-9-]+$

That doesn't allow a period yet. Let's change it:

^[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*$

That says: "at least one valid character followed by zero or more sets
consisting of a period and one or more valid characters."

To simplify things a bit, we can use the expression above with eregi(),
instead of ereg(). Because eregi() is not sensitive to case,
we don't have to specify both ranges "a-z" and "A-Z"
-- one of them is enough:

^[_a-z0-9-]+(.[_a-z0-9-]+)*$

For the server name it's the same, but without the underscores:

^[a-z0-9-]+(.[a-z0-9-]+)*$

Done. Now, joining both expressions around the 'at' sign, we get:

^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+)*$

Other uses
Extracting Parts of a String

ereg() and eregi() have a feature that allows us to extract
matches of patterns from strings (read the manual for details on how to use
that). For instance, say we want do get the filename from a path/URL string --
this code would be all we need:

ereg("([^/]*)$", $pathOrUrl, $regs);
echo $regs[1];

Advanced Replacing

ereg_replace() and eregi_replace() are also very useful:
suppose we want to separate all words in a string by commas:

ereg_replace("[
]+", ",", trim($str));

Some exercises

Now here's something to make you busy:

  • modify our e-mail-validating regular expression to force the server name
    part to consist of at least two names (hint: only one character needs to be
    changed);

  • build a function call to ereg_replace() that emulates trim();
  • make up another function call to ereg_replace() that escapes the
    characters '#', '@', '&', and '%' of a string with a '~'.

-dario

Originally posted at
PHPBuilder.com