Parse tags that look sort of like XML

Started by nandagopal, Nov 14, 2008, 04:10 PM

Previous topic - Next topic

nandagopal

The TagParser class parses some text and raises events when it reads tags. It raises the events ParseError, InteriorTag, and LeafTag.
The public ParseTags subroutine starts the process by calling ParseTheTags.

ParseTheTags looks at the beginning of the text to find the start tag. It then looks for the next corresponding closing tag and pulls out the text between.


' Parse the input text for tags.
Public Sub ParseTags(ByVal input_text As String)
    ParseTheTags input_text, 0
End Sub

' Pull off the next level of tag.
Private Sub ParseTheTags(ByRef input_text As String, ByVal _
    depth As Integer)
Dim pos1 As Integer
Dim pos2 As Integer
Dim tag_name As String
Dim tag_contents As String

    ' Trim the text.
    input_text = TrimWhitespace(input_text)

    ' Repeat while there is input text left.
    Do While Len(input_text) > 0
        ' Make sure the text begins with a tag.
        If Left$(input_text, 1) <> "<" Then
            RaiseEvent ParseError( _
                "Expected '<' but found '" & _
                input_text & "'")
            Exit Sub
        End If

        ' Find the start tag.
        pos1 = InStr(input_text, ">")
        tag_name = Mid$(input_text, 2, pos1 - 2)

        ' Find the end tag.
        pos2 = InStr(input_text, "</" & tag_name & ">")
        tag_contents = TrimWhitespace(Mid$(input_text, pos1 _
            + 1, pos2 - pos1 - 1))

        ' See if the tag_contents begins with a tag.
        If Left$(tag_contents, 1) = "<" Then
            ' This is an interior tag.
            ' Raise the InteriorTag event.
            RaiseEvent InteriorTag(tag_name, tag_contents, _
                depth)

            ' Parse the tag for child tags.
            ParseTheTags tag_contents, depth + 1
        Else
            ' This is a leaf tag.
            ' Raise the LeafTag event.
            RaiseEvent LeafTag(tag_name, tag_contents, _
                depth)
        End If

        ' Remove the tag from the input text.
        input_text = TrimWhitespace( _
            Mid$(input_text, pos2 + Len(tag_name) + 3))
    Loop
End Sub


This class is similar to a stripped down version of the SAX (Simple API for XML) parser.