Search This Blog

Saturday, March 27, 2010

Regex Match/Group syntax in .NET

I'm used to doing this in UNIX and there it's a lot easier. But I live in a .NET world.

For handling matching with REGEX you need to use the "Groups" construct of the "Match" object in .NET. An example says it all -- say I'm looking at a string followed (optionally) by an integer:

private Regex regex1 = new Regex (
"([a-zA-Z]+)([0-9]*)",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);


This was used for creating a unique name in an Active Directory database -- a name was suggested and, if there's a collision, it takes the name and tacks a number to the end (e.g., "SSMITH" collides; suggest "SSMITH1"). It then checks that one against the Active Directory and if it sense a collision again it breaks it apart and increments the trailing number ... lather, rinse, repeat until a unique name is found. Here's a chunk that indicates the use of REGEX groups in .NET:

// Break current name down into text and integer increment components
Match mc1 = regex1.Match(ProposedSAMAccountName);
ProposedSAMAccountNameText = mc1.Groups[0].Value.ToString();
if (mc1.Groups[2].Length > 0) ProposedSAMAccountIncrement =
Convert.ToInt32(mc1.Groups[2].Value)++;


The first member of the "Groups" collection is the entire matching expression, the second is the first matching group, the third is the second matching group, etc.

This'll be the third time I've wasted an hour re-figuring out something so intuitive in PERL and UNIX. This syntax in .NET just seems awkward, I don't think I'll ever speak it without an accent.

I'll just add this to this cheatsheet because I'm sure this'll come up again.

No comments: