KnowDotNet

Parsing with Regular Expressions - CountOccurrences in a String

by Brian Davis and Les Smith

Regular Expressions are one of the most powerful innovations to debut with Visual Studio .NET.  This article shows another parsing method; counting occurrences of an expression in a string.

This series of articles will highlight the use of Regular Expressions to greatly reduce the amount of code and to greatly increase performance in a library of text parsing functions.

Figure 1 shows the code for CountOccurrences using statndard VB.NET code.  As you can see, not only is there a fair amount of code, but the code has to loop for the number of occurrences of the search expression.

Figure 1 - CountOccurrences Using Regular VB.NET Code.

   Friend Function CountOccurrences(ByVal rsExp As String, _
      
ByVal rsStr As Object, Optional ByVal cs As Boolean = False) _
      
As Integer
      ' Returns the number of occurrences of rsExp (expression)
      ' found in rsStr (string)
      ' Returns 0 of no occurrences found.
      Dim pPos As Integer
      Dim lPos As Integer
      Dim nPos As Integer
      Dim nFirst As Integer
      Dim lCnt As Integer

      Try
         Dim
i  As Short = IIf(cs, 1, 0)
         pPos = 0 ' previous find
         lPos = 0 ' return position of right char
         nPos = 1 ' position of next right most char
         nFirst = 1
         lCnt = 0
        

        
' loop thru every char in string until we
         ' find the last occurrence
         Do
            lPos = InStr(nPos, rsStr, rsExp, i)
            
If lPos > 0 Then
               nPos = lPos + 1
               pPos = lPos
               lCnt += 1
            
Else
               Exit Do
            End If
         Loop

         Return lCnt
      
Catch e As System.Exception
      
End Try
   End Function


Figure 2 demonstrates the CountOccurrences code using Regular Expressions.  Not only is the amount of code reduced, but the performance will be improved.  If you want case sensitivity, call the second overloaded function, otherwise call the the first.  Since we do not know what characters are in the Target string, we must use RegEx.Escape to ensure that we handle characters that the Regular Expression Engine considers "escape characters", such as $,\, etc.  Calling the Regex.Escape method, passing the Target string, automatically takes care of this nuance of Regular Expressions.

Figure 2 - CountOccurrences Using Regular Expressions.

   Public Overloads Function CountOccurrences( _
      
ByVal Target As String, _
      
ByVal Source As String) _
      
As Integer

      'This is case insensitive by default
      '- use the overloaded method to consider case
      Return CountOccurrences(Target, Source, False)
  
End Function

   Public Overloads Function CountOccurrences( _
      
ByVal Target As String, _
      
ByVal Source As String, _
      
ByVal CaseSensitive As Boolean) _
      A
s Integer
      'This overloaded version allows the caller to
      'specify case-sensitivity
      If CaseSensitive Then
         Return Regex.Matches(Source, Regex.Escape(Target)).Count
      
Else
         Return Regex.Matches(Source, Regex.Escape(Target), _
            RegexOptions.IgnoreCase).Count
      
End If
   End Function

Top of Page