Need to parse a truly comma delimited text line? It is not easy without the use of regular expressions. This article demonstrates the use of Regular Expression to parse a truly Comma Delimited line that has optional double quotes with the possibility of nested quotes such as Excel can handle.
I was faced with parsing a truly comma delimited text file in a project recently and started writing a regular expression to handle it. After several iterations, that were not giving me what I needed, I asked my partner Brian Davis to give me a Regex that would handle the problem.
Brian is the guru on Regular Expressions, and he sent me one that did not look anything like what I had been using. He said he would explain it to me if I needed him to. At the time, I was in the heat of the battle with the project and the Regex worked, so I am using the Regex shown below without being able to explain it. It works great and if it will help you without having to understand it, then great! If you need an explanation, drop Brian an email at BDavis@KnowDotNet.com.
The code is illustrated in an abbreviated method, shown below.
The file that I am working with, optionally uses double quotes to enclose fields that could potentially have embedded commas, but does not use them for fields that can't have commas. Right away, there is no easy way to parse this without a Regex.
| ''' <summary> ''' Use an enum to access the fields of the comma delimited ''' line so that if the fields change, you only change the enum ''' and not the rest of the code. ''' </summary> ''' <remarks></remarks> Public Enum FD SSN FirstName LastName End Enum |
''' <summary> ''' This method accepts a comma delimited line and parses it ''' </summary> ''' <param name="line" ></param> ''' <remarks></remarks> Public Sub ProcessLine(ByVal line As String) Dim patt As String = _ "(?m)^((?("")""(?<quote>)|(?<=(^|,)))(?<field>(?(quote)(""""|[^""]|""(?!,))|[^,])*?)(?(quote)""(?<-quote>))(?($)|,))+" |
Dim m As Match = Regex.Match(line, patt) With m.Groups("field") Dim ssn As String = .Captures(FD.SSN).Value Dim FirstName As String = .Captures(FD.FirstName).Value Dim lastName As String = .Captures(FD.LastName).Value End With End Sub |