Thursday, June 23, 2011

Normalizing Repeated Columns: VBA

In my earlier blog series (The Problem of Repeated Columns), I defined repeated columns and talked about the data integrity problems associated with them. I also showed several different examples of repeated columns and the how difficult it is to query repeated columns as compared to the normalized equivalent. If you haven't read these yet, it would be worthwhile to read first. Similarly, if you are not familiar with the concept of Normalization, you should read my blog series What is Normalization?

One of the comments suggested I should discuss how to convert a table with repeated columns into a normalized table structure. I thought that was a really good idea, so I'm going to spend the next few posts doing so. The difficulty in discussing this issue, however, is that the exact solution differs with each table, so there is no single solution. Fortunately, in the last series, I showed 5 different tables, each with a slightly different structure, so while I cannot show a single solution that will work for every table with repeated columns, hopefully, one of the following will work for most cases.

Others in this series:
In my previous examples, I used purely SQL solutions. This time, I thought I'd illustrate how to do this with VBA. There's no real advantage to using VBA over the SQL solutions. In fact, in large databases, it will almost always be slower. Nevertheless, there may be situations with complex data that VBA would be the best solution, and besides, it represents another tool in your Access toolkit.

In my post Aggregating Across Repeated Columns: Summing, I discussed an example of table with repeated columns that looked like this:


Figure 1: Student Scores table with repeated columns

Normalized to the First Normal Form (1NF) to remove the repeated columns, the table would look like this:

Figure 2: Patient Symptom table (1NF)

Figure 2, however does not represent the final form of normalization. Because the student names are repeated, they should be removed to a separate table. So normalizing to Third Normal Form (3NF) I would have something like this:

Figure 3: Normalized to Third Normal Form (3NF) - Tables and Relationships Views

(For a further explanation of both First and Third Normal Forms, see The Normal Forms , What is Normalization, and Entity-Relationship Diagramming).

Normalizing to First Normal Form (1NF)

I'll start with the 1NF because it's less complex. Overall, it's a matter of looping through the records in the denormalized table (i.e. StudentScores_RepeatedColumns) and writing each column value into a new record in the 1NF table (i.e. StudentScores_1NF).

Public Sub Normalize_RepeatedColumns_VBA_1NF(ParamArray FieldNames())
'   This routine writes data from a table with repeated columns into a
'   table normalized to the First Normal Form (1NF)
'   The field names which are send in via the parameter list of the call
'   are written into an array called FieldNames.

'   declare variablesDim i As Integer
Dim db As DAO.Database
Dim rsSource As DAO.Recordset
Dim rsTarget As DAO.Recordset

'   Open database object to current database
Set db = CurrentDb
'   Open the denormalized table to read the values
Set rsSource = db.OpenRecordset("StudentScores_RepeatedColumns")
'   Open the normalized table write the values
Set rsTarget = db.OpenRecordset("StudentScores_1NF")
'   Loop through the denormalized Source table
Do While Not rsSource.EOF
    '   Loop through the fields, i.e. the values in the parameter array    
For i = LBound(FieldNames) To UBound(FieldNames)
        '   Add a New record to the target table, write the values,
        '   and save (update) the record
        rsTarget.AddNew
            '   write the student name
            rsTarget!StudentID = rsSource!Student
            '   write the field name
            rsTarget!TestNum = FieldNames(i)
            '   write the field value
            rsTarget!Score = rsSource(FieldNames(i))
        rsTarget.Update
     Next i
     rsSource.MoveNext

Loop
End Sub

You can call the subroutine like so:
Sub test1NF()
'   To run the subroutine, place the cursor in this sub and click "run"
'   The arguments are the field names of the columns you want to
'   normalize

  Call Normalize_RepeatedColumns_VBA_1N _
       ("Test1", "Test2", "Test3", "Test4")
   
End Sub


The end result looks like Figure 2 above.


Normalizing to Third Normal Form (3NF)

The process for normalizing repeated column data into 3NF is similar to the 1NF process. It does, however, require two loops, one to add records to the "one-side" table ("Student") , and an inner loop (For...Next) to write records to the "many-side" table ("StudentScores").

It is important to note that the order in which the data is moved is vital. Data must be written into the "one-side" table first, and then data can be moved into the "many-side" table. Overall, the process is to loop through the records in the denormalized table, write common values to the a record in the one-side table, store the primary key value from the new record, and then for each field in the parameter array, create a new record in the many-side table.

Public Sub Normalize_RepeatedColumns_VBA_3NF(ParamArray FieldNames())
'   This routine writes data from a table with repeated columns into two
'   normalized tables: Students and StudentScores
'   The field names which are send in via the parameter list of the call
'   are written into an array called FieldNames.

'   declare variables
Dim i As Integer
Dim db As DAO.Database
Dim rsSource As DAO.Recordset
Dim rsTargetOneSide As DAO.Recordset
Dim rsTargetManySide As DAO.Recordset
Dim StudentIDtemp As Long

'   Open database object to current database
Set db = CurrentDb
'   Open the denormalized table to read the values
Set rsSource = db.OpenRecordset("StudentScores_RepeatedColumns")
'   Open the "one-side" table to write the values
Set rsTargetOneSide = db.OpenRecordset("Student")
'   Open the "Many-side" table to write the values
Set rsTargetManySide = db.OpenRecordset("StudentScores")
'   Loop through the denormalized Source table
Do While Not rsSource.EOF
    '   Add a New record to the "one-side" target table, save
    '   the primary key value (autonumber) for use later,
    '   and save (update) the record
    rsTargetOneSide.AddNew
        rsTargetOneSide!Student = rsSource!Student
        '   save the StudentID created by the autonumber field
        StudentIDtemp = CLng(rsSource("StudentID"))
    rsTargetOneSide.Update
   
    '   Loop through the fields in the Parameter Array    
For i = LBound(FieldNames) To UBound(FieldNames)
   
        '   Add a New record to the "many-side" target table,
        '   write the values, and save (update) the record       
             rsTargetManySide.AddNew
            '   write the saved student id\
            rsTargetManySide!StudentID = StudentIDtemp
            '   write the field name
            rsTargetManySide!TestNum = FieldNames(i)
            '   write the field value
            rsTargetManySide!Score = rsSource(FieldNames(i))
        rsTargetManySide.Update
       
    Next i
    rsSource.MoveNext

Loop
End Sub

You can call the subroutine like so:
Sub test3NF()
'   To run the subroutine, place the cursor in this sub and click "run"
'   The arguments are the field names of the columns you want to
'   normalize  

  Call Normalize_RepeatedColumns_VBA_3NF _
      ("Test1", "Test2", "Test3", "Test4")
   
End Sub

The end result looks like Figure 3 above.

You can find a sample which illustrates the above here:
http://www.rogersaccesslibrary.com/forum/normalizingrepeatingcolumnsvbamdb_topic567.html















No comments: