KnowDotNet NetRefactor

Performance Issues in ADO.NET DataTable Usage.

Use Enum to Access Items in DataRow

by Les Smith
Print this Article Discuss in Forums

Use Enum to Access DataColumns in a DataRow for Performance.  Integer indexes are faster than String indexes for accessing an Item in a DataRow.

I am in the midst of refactoring some code written in the early days of VB.NET, when I and other project members were just learning .NET strategies.  The particular method is walking through the rows of a DataTable.  In the first place, it had many lines of code like the following line.

   jlength = dt.Rows(i).Item("job_length")

The first step that I took was to place the some code at the top of the For Loop so that I could remove all of the references to "dt.Rows(i).Item(...)".  The reason is obvious; for every line where this type of reference is made, the Compiler has to generate code to evaluate dt, rows(i), and Item("name").  I corrected this problem by placing this code.

  Dim dr As DataRow
  
For i = 0 To inbrRows
     dr = dt.Rows(i)

Now I can change the reference to the Item in the DataRow by using the
dr object instead of dt.Rows(i), as follows.

   jlength = dr("job_length")

That will give you better performance as you access the DataColumns in the DataRow.

Now, assume that I have many columns selected in the DataTable.  It is faster to use an Integer index to the column than to use a String index, as shown in the next line.  

   jlength = dr(12)

However, if I have many columns in the table, it will be very easy for me to use the wrong number, e.g., 12 instead of 13, etc., and if there is such a column, and it is the right data type, I may not raise an error, but I sure do have a bug!  Even if I am lucky enough to code it properly, the code is not very readable, to put it lightly.  The solution is to use an Enum.  Even without an add-in (which I might stop and write very soon), I can create the Enum with a couple of simple steps.

First, I will create an empty Enum shell as shown below.

   Private Enum MDF

  
End Enum


In this case, I am working in a MissingDataFields Report method, thus the name "MDF".  Next, I will go to the Stored Procedure and copy the Select clause of the procedure and paste it into the Enum shell as shown next.

   Private Enum MDF
      
Select author_id, date_dictated, transcribed_file_time,
      work_type,excel_job_number,foreign_batch,
      job_number, author_name, excel_author_name, department,
      date_transcribed,dictation_date_time,transcription_date_time,job_length
  
End Enum

Obviously, the Enum is no good at this point, so I simply remove the
Select and and then start highlighting the commas (,) and press return until all of them are gone.  The result will be as follows.

   Private Enum MDF
      author_id
      date_dictated
      transcribed_file_time
      work_type
      foreign_batch
      job_number
      author_name
      department
      date_transcribed
      dictation_date_time
      transcription_date_time
      job_length
  
End Enum

And guess what, the first name in the Enum will have a value of "0", the second will be "1", etc.  The Enum values are automatically ordered the same way the the columns will appear in the DataTable.  Now, I can start referencing the desired columns, using their real name, but the code generated will be using an integer value, and Intellisense will kick in as soon as I type "MDF.".  An example is shown next.

   sLastDept = CStr(dr(MDF.department))
   sLastWT =
CStr(dr(MDF.work_type))

While I am on the subject of performance, again, while doing some refactoring of some very long methods, I created a smaller method and passed a DataTable object from the calling method to the new method.  The new method is being called in a For..Loop and only processing one DataRow of the DataTable each time it is called.  I found that my new method was causing the report to run 8-10 times longer than it had before I refactored the larger method.  Immediately, I suspected that passing the DataTable (with several hundred rows in it) was the problem.  I changed the call to pass only one DataRow and the performance reverted to normal.  Even though the DataTable was passed ByRef, there was still a huge performance hit when passing the whole DataTable.  
  

Writing Add-Ins for Visual Studio .NET
Writing Add-ins for Visual Studio .NET
by Les Smith
Apress Publishing