Regex Basics - Named Groups, Backreferences, and Regex.Replace
A U.S. Social Security Number is a 9-digit number, but it can be formatted in several different ways. Sometimes it appears as a number with no separation, like 123456789. Other times, however, a space or a dash may by used to make it 123 45 6789 or 123-45-6789. We may write an expression to cover these different separators that looks like this:
\d{3}[ -]?\d{2}[ -]?\d{4}
|
This would allow us to match the preceding examples, but it would also match things like 12345-6789 or 123-45 6789, which do not really look like well-formatted SSNs. To insure that we have separators in both places and that the separators are the same, we can use named groups and a backreference.
\d{3}(?<separator>[ -]?)\d{2}\k<separator>\d{4}
|
The (?<separator>[ -]?) captures the separator in a named group, enabling us to reference this later using \k<separator>. Named groups provide a powerful way to reference earlier matched portions. Rather than just match well-formatted SSNs, we could use other named groups to convert those SSNs into the format we desire. If we are storing these SSNs in a database, for example, we may want to put them into a numeric field without separators, regardless of what separators were in the original. This is accomplished by grouping the numeric portions of the SSN as well as the separators.
(?<first>\d{3})(?<separator>[ -]?)
(?<second>\d{2})\k<separator>(?<third>\d{4})
|
Using the Replace method, we can get rid of any separators that may be in the SSN
Regex.Replace("123-45-6789","(?<first>\d{3})(?<separator>[ -]?)(?<second>\d{2})\k<separator>(?<third>\d{4})","${first}${second}${third}")
|
Named groups, backreferences, and replacement expressions are also useful for formatting dates, reading comma delimited files from Microsoft Excel or databases, and rearranging items in a list.