KnowDotNet

Extending the CodeModel

Overcoming Missing CodeModel Features

by Les Smith

Want to be able to pick up every code element in a CodeWindow, including comments, compiler directives, etc?  The CodeModel gets most of it but not all.  This article shows how to extend the CodeModel to pick up the lines and code elements that the CodeModel misses.

When Microsoft first introduced the CodeModel in Visual Studio .NET 1.0, it was a feature, from the standpoint of the developer attempting to write add-ins, that was long overdue.  For several years, I had been writing add-ins, or at least creating extensions to the VB IDE before they were even called add-ins.  Full-blown add-ins were introduced in VB5, but the task of parsing code from a code window, before the introduction of the CodeModel was still a daunting task.  

With the advent of Visual Studio .NET 2005, we now have CodeModel2 with new features, which this article will not attempt to cover.  However, as with the CodeModel, CodeModel2 still has a few limitations.  Like most developers, I am never satisfied.  You give us some great new feature, and it only whets our appetite for more.

The CodeModel and its sub class, the FileCodeModel (which is the central object of this article) are designed to provide the user of the Extensibility model the ability to browse and in some areas modify the structure of code in a Class, Namespace, or ProjectItem.  However, there are still a few needs that the FileCodeModel does not meet.  Some are by design; others appear to be still unimplemented features.  For example, even though the
Inherits and Implements statements are listed in the vsCMElement Enumerations, the FileCodeModel does not pick them up in VB Projects.  Since these two items are simply extensions of the Class Statement in C#, they are not missed in C# programs. In VB projects, this appears to be either a bug or an unimplemented feature.  The other, more subtle short coming is the lack of ability to pick up Compiler Directives.  This really manifests itself in a big way when you have a sequence of code such as the following:

#If DEBUG Then
   Private var2 As Integer
#Else
   Private var2 As Int32
#
End If

In the code snippet, shown above, if DEBUG is True, the FileCodeModel will see the first instance of the
var2 variable, but it will not see the second instance.  This is by design.  Obviously, the same variable name cannot exist more than once in the same scope.  So, if you are attempting to use the FileCodeModel to retrieve all of the code in a ProjectItem, you will miss the second instance of var2, simply because it will not compile and therefore the FileCodeModel will not see it, and consequently can't return it to you.

Between the #If/#Else and the #Else/#End If could be any number of lines of physical code and the FileCodeModel can only return either the first or second set, but never both.

Again, if you use
#Regions, as much as I do, then there is a problem with them also because they are a type of Compiler Directive, not compilable code, and again, will not be returned by the FileCodeModel.  Finally, comments outside of a function will not be picked up by the CodeModel.

I am currently working on an add-in in which I need to retrieve every CodeElement in the ProjectItem, and I need to know its Name and Scope.  I am pulling all of the code out of the window and placing each Code Element in a ListView.  In the ListView I will allow the user to manipulate the sequence of the code elements, and then I will rebuild the contents of the Code Window from the contents of the ListView.  Consequently, I needed to extend the functionality of the FileCodeModel by writing several new classes to meet the short-comings of the FileCodeModel which this article points out.

First, let me show you how to pull the
Inherits and Implements statements from the code of the code window.  I use this code as I am retrieving all of the code that the FileCodeModel is capable of returning to me.  When I encounter a Class Element, I call the method shown below to get the Inherits and Implements. I already have a collection of CodeElt objects and the following method will retrieve the Inherits and Implements statements and add them to the collection also.  Note that the Regex pattern will only find Inherit and Implements at the beginning of a line.  That's because I do not want to find them at the end of the line of the implemented members.

   Private Sub GetInheritsAndImplements(ByVal classCode As String, _
      
ByVal elt As CodeElement)
      
Dim re As String = _
        
"^\s*(?<IH>(Inherits|Implements))\s+(?<name>.*)"
      Dim mc As MatchCollection = Regex.Matches(classCode, re, _
         RegexOptions.Multiline)
      
For Each m As Match In mc

        
Dim ce As CodeElt = New CodeElt
        
With ce
            .StPt = elt.StartPoint.Line
            .EndPt = elt.StartPoint.Line

            .FName = elt.Name
            .kind = elt.Kind
            .SKind = m.Groups(
"IH").Value
            .FCode = m.Groups(
"IH").Value & " " & _
                m.Groups(
"name").Value
            .id = .EndPt

            SharedCode.AddCEToCollection(colElts, ce, _
               SharedCode.order.Below)
        
End With
      Next m
  
End Sub

In building add-ins that work with code, Regular Expressions are a necessity.  The Regex, shown above retrieves every instance of an
Inherits or Implements statement in a Match Collection so that I can put them in my CodeElt collection.

Regions and Directives are a different animal and present a totally different set of problems.  For brevity, I will not be able to go into the gathering of #Regions, because they are retrieved by the same methodology as Compiler Directives, but really do not present the same challenge as do Directives.  

Earlier, I demonstrated the problem that conditional compile directives present.  Now, I will show you the code for retrieving them along with the code that they encapsulate.  Although the FileCodeModel has already retrieved the code enclosed by the "True" side of an "#If/#Else/#End If" construct, the methods, that retrieve the directives, will pick up the same code again.  This is due to the fact that I have no way of knowing the value of the conditional variable and therefore no way of knowing which part of the code that the FileCodeModel has already retrieved.  So, I retrieve all of the code and then when adding to the ListView, I can see which code is already there and then simply not insert the duplicate code found by the following methods.

The method, shown below, will find
all compiler directives.  Some of them will simply be placed in the collection of directives that I am building.  However, when I encounter a #If (#if in C#), it will require special handling.  First, I will dimension some private variables that are needed.  Language type (langType) has already been populated by an 8 for VB.NET or 9 for C# projects.

   Public DirectiveList As Collection
  
Public rtb As RichTextBox
  
Private langType As Integer

Next, we have the code for the method that enumerates the directives.  After enumerating the directives, we must ensure that any directive found is not within the bounds of a function.  For that, I call the IsDirectiveInFunction method.  If it is, then I want to ignore the directives because they have already been found and included in the body of the function by the FileCodeModel.  If the directive is a #If, I must capture each line of code separately until I reach the next #End If.  To do that, I call the GetCodeLinesUntilNextEndIf method.  Before doing that, I am placing the code from the CodeElement into a RichTextBox.  I then use the GetLineFromCharIndex of the RichTextBox to determine where, in the code, the directive line was found.  I will pass that line number (+1) to the GetCodeLinesUntilNextEndIf method because I have already added the #If directive to the collection.

   Public Sub EnumerateCompileDirectivesFromCode(ByRef code _
      
As String, ByVal col As Collection)
      
Try
         rtb = New RichTextBox()
         rtb.WordWrap =
False
         rtb.Text = code.Replace(vbCrLf, vbLf)

         DirectiveList =
New Collection
        
Dim exp As String = _
            
"^\s*#(?<name>\w+)[\ \t]*(?<remLine>.*?)(\r\n|\r|\n)"
         Dim m As Match
        
Dim mc As MatchCollection = Regex.Matches(rtb.Text, _
            exp, RegexOptions.Multiline)

        
For Each m In mc
            
Dim dirName As String = m.Groups("name").Value.ToUpper

            
If dirName.Equals("REGION") OrElse _
               (dirName.Equals(
"END") AndAlso _
               m.Groups(
"remLine").Value.ToUpper.StartsWith _
               (
"REGION")) Then
            Else
               Dim lineNbr As Integer = _
                  rtb.GetLineFromCharIndex(m.Index) + 1
              
If Not IsDirectiveInFunction(lineNbr, col) Then
                  If dirName.Equals("IF") Then
                     AddDirectiveLineToCollection(m)
                     GetCodeLinesUntilNextEndIf(code, lineNbr + 1)
                  
ElseIf dirName.StartsWith("ELSE") Then
                     ' ignore it, we already picked it up
                  Else
                     ' all other directives are standalone
                     ' except #if/#endif
                     AddDirectiveLineToCollection(m)
                  
End If
               End If
            End If
         Next
      Catch ex As System.Exception
         MsgBox(ex.ToString)
      
End Try
   End Sub

The following method will add all code lines to the Directive collection until a #End If is found.  I am working with a Match Collection and it only has the #If directive in the Match object.  To retrieve the code, I need to have a way to get to it from the index in the Match object.  This could be done by using an EditPoint object, but I have found that moving the code of the CodeElement to a RichTextBox is easier.  I have created a TBMemoline class that reads lines from a block of code passed to it.  Note that I have passed the line number of the first line of code after the "#If" directive and will start reading the code from that point.  I will stop reading when I reach "#End If" because it is in the Match Collection and will be picked up by the previous method when I return control to it.  

As I mentioned earlier in the article, I need to know the name and scope of the code elements that I am retrieving.  The regexes used in the method, shown below, are used to parse the lines captured.  Since directives found within a function are ignored, I know that all code lines here are some type of global variable.


   Private Sub GetCodeLinesUntilNextEndIf(ByVal code As String, _
      
ByVal j As Integer)
      
Dim ml As New TBMemoLine()
      
Dim nL As Integer = ml.MLCount(code)
      
Const vbGlobalVarPatt As String = _
        
"^\s*(?<accessor>(Public|Protected " & _
         "Friend|Protected|Friend|Private|Dim))\s+"
& _
         "(\w+\s+)*(((?\w+(\((\d*,)*"
& _
         "\d*\))*)\s*,\s*)*(?\w+"
& _
         "(\((\d*,)*\d*\))*)\s+As\s+)"

      Const csGlobalVarPatt As String = _
         "\b(?((using|return|if|case|switch|do|for|while)" & _
         "\b)(?!)|)\s*(?<accessor>(private|public|protected "
& _
         "internal|protected|internal)\s+)*"
& _
         "(?<modifier>\w+\s+)*(?<type>([A-Za-z]\w*\.)*[A-Za-"
& _
         "z]\w*(\s*\[,*])*)\s+((?\w+"
& _
         "(\[(\d*,)*\d*\])*)\s*,\s*)*(?\w+(\[(\d*,)*\d*\])*)"

      For i As Integer = j To nL - 1
        
Dim line As String = ml.MemoLine(i)
        
If line.Trim.Length = 0 AndAlso i = nL - 1 Then Exit Sub
         If line.Trim.ToUpper.StartsWith("#ENDIF") OrElse _
          
line.Trim.ToUpper.StartsWith("#END IF") Then
            Exit Sub
         End If
         If line.Trim.StartsWith("#") Then
            AddDirectiveLineToCollection(line, i + 1, "")
        
Else
            ' code line must be parsed
            If langType.Equals(8) Then
               Dim m As Match = Regex.Match(line, vbGlobalVarPatt, _
                  RegexOptions.ExplicitCapture
Or _
                  RegexOptions.RightToLeft
Or_
                  RegexOptions.IgnoreCase)
              
If m.Success Then
                  AddDirectiveLineToCollection(line, i + 1, _
                     m.Groups(
"accessor").Value, _
                     m.Groups(
"var").Value)
              
Else
                  AddDirectiveLineToCollection(line, i + 1, line)
              
End If
            Else 'c#
               Dim m As Match = Regex.Match(line, csGlobalVarPatt)
              
If m.Success Then
                  AddDirectiveLineToCollection(line, i + 1, _
                     m.Groups(
"accessor").Value, _
                     m.Groups(
"var").Value)
              
Else
                  AddDirectiveLineToCollection(line, i + 1, line)
              
End If
            End If
         End If
      Next
   End Sub

The final piece of code that I can show in the article is shown below.  It returns true if the line of code is within a function.  The FileCodeModel returns the start and end lines of each CodeElement in a ProjectItem.  I can use the current line number to see if it is found within the start and end lines of any function.  

   Private Function IsDirectiveInFunction(ByVal line As Integer, _
      
ByVal col As Collection) As Boolean
      For Each ce As CodeElt In col
        
With ce
            
If .kind.Equals(vsCMElement.vsCMElementFunction) Then
               If line >= .StPt AndAlso line <= .EndPt Then
                  Return True
               End If
            End If
         End With
      Next
      Return False
   End Function

Ask a Question, or give your feedback on my articles or products by going to the KnowDotNet Forum or by clicking on My Blog.