KnowDotNet

Combining Regular Expressions and RichTextBox Methods to Parse Code

Enumerating String Occurrences and Location

by Les Smith and Brian Davis

You can combine the use of Regular Express and RichTextBox functionality to locate the occurrence of expressions in a string and at the same time capture the line number of the lines containing the expressions.  Regular Expressions are a powerful addition to the developers toolkit in .NET.  However, with a little ingenuity, you can combine the functionality of the RichTextBox Class to make them even more productive.

To illustrate this, I am going to create an instance of the RichTextBox, and call it "rtb".  

   Public Shared rtb As New Windows.Forms.RichTextBox


Next, I will place the following text into the textbox.  Please disrerard the fact that it appears to be a mixture of VB Code and some other syntax,
Setup " Test Set 1 ",  I have placed this syntax in the RichTextBox Text for demonstration purposes.

   Public Class Test
   Setup " Test Set 1 "
   ' some code
   Public Sub Test()
   End Sub
   Setup "Test Set 2"
   ' some more code


Finally, I will call the Sub shown below.  This method creates a MatchCollection of all of the expressions, enclosed in quotes, in the lines that start with the word "Setup".  


   Public Sub EnumerateExpressions()
      
Dim sExp As String = "^[ \t]*Setup "" ?(?<name>[^""]+) ?"""
      
Dim m As Match
      
Dim mc As MatchCollection = _
         Regex.Matches(rtb.Text.Replace(vbCrLf, vbLf), _
         sExp, RegexOptions.Multiline
Or RegexOptions.IgnoreCase)
      
For Each m In mc
         Debug.WriteLine(m.Groups("name").Value)
         Debug.WriteLine(m.Index)
         Debug.WriteLine(rtb.GetLineFromCharIndex(m.Index))
      
Next
   End Sub

The output from the method goes to the Output Window and I have shown it below.  

   Test Set 1
   18
   2
   Test Set 2
   86
   8

As a byproduct of executing the Regular Expression, two values result for each instance of the specified search expression
Setup "...."  The Value property of the Match object contains the expression that was within the quotes.  Secondly, the Index property of the Match object is the character index within the String contained in the RichTextBox.Text.  Since I am not only looking for the expression (Setup), but I also want to know the line numbers in which it occurs.  However, the Regular Expression yeilds a character index rather than a line number.  

That's where the RichTextBox comes in to play.  Using the GetLineFromCharIndex method of the RichTextBox class, I can convert the character index, from the Match object, to a line number within the textbox.  In scanning source code, for the purpose of writing add-ins, we are interested in line numbers because they can be used to extract code from code windows in the IDE.

There is one other little subtle nuance.  The GetLineFromCharIndex converts the vbCrLf, that the IDE Text Editor uses to delimit lines, to a single New Line character.  That's why I put the .Replace(vbCrLf, vbLf) in the following line.  Otherwise, the index created by the Regular Expression will be inconsistent with the way that the GetLineFromCharIndex works, and the line numbers will "creep" on you and obviously, you will get invalid data when you attempt to retrieve the specified lines at a later point.

      Dim mc As MatchCollection = _
         Regex.Matches(rtb.Text.Replace(vbCrLf, vbLf), _
         sExp, RegexOptions.Multiline
Or RegexOptions.IgnoreCase)