A year ago, I released the first version of my
xml parser. I recently had an actual need for it, so I made a few modifications and packaged it up to where I feel it is finally production ready. Here's a list of the available functions:
xmlReadFile(string)
xmlGetTagName$(integer)
xmlGetAttributeValue$(integer, string)
xmlGetElementContent$(integer, boolean)
xmlClear()
I wouldn't call this a 100% complete XML package, but it should be enough to get the job done. The earlier version was still unable to extract attributes, this one does. As I can't really use arrays inside a UDT, the attributes are only exacted from a tag when you attempt to retrieve it. Because the attributes are not stored independently anywhere internally, you should store this value yourself when you call it so the text parsing doesn't have to be used more than necessary.
I've attached a sample XML file to be used in this example. The output of this example is specific to using this XML file, but you should see how easy it is to use. I plan to add a few other functions for retrieving parent and child nodes. It's technically possible now, but the simplified functions haven't been implemented yet.
Edit: Jan. 8, 2011
Updated to version 1.2 which fixes a bug that stored leading spaces and tab characters in an element's content. This only became visible when using Cloggy's DLL which obeys the tabs in strings, whereas DB's default text/print commands ignore them.
Edit: Jan 10, 2011
Updated to 1.21, which fixed a bug in the xmlGetAttributeValue() function that creeps up in Windows 7 and possibly Vista.
REM **************************************************************************************************************
REM \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
REM Title: XML Parser
REM Author: Phaelax
REM Version: 1.21
REM
REM
REM To use these functions, you must include the following four constants,
REM the ElementObject UDT, and the two array definitions.
REM
REM
REM
REM Available functions:
REM
REM xmlReadFile(string)
REM xmlGetTagName$(integer)
REM xmlGetAttributeValue$(integer, string)
REM xmlGetElementContent$(integer, boolean)
REM xmlGetTagCount(string)
REM xmlClear()
REM
REM
REM
REM \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
REM **************************************************************************************************************
REM tags which have both matching open and close tags
#CONSTANT NODE_TYPE_CONTAINER = 1
REM a single tag which has no closing tag (empty element tag)
#CONSTANT NODE_TYPE_EMPTY = 2
REM duh!
#CONSTANT TRUE 1
#CONSTANT FALSE 0
REM An XML tag element
Type ElementObject
tagName as string
parentElementId as integer
attributes as string
content as string
pos as integer
parentPos as integer
unparsedAttributes as string
EndType
REM The internal array used by the xml functions to store all xml tags
dim xmlTags() as ElementObject
REM An internal array used by the functions to keep track of containers.
REM This array should always be empty unless the readXML function is being processed
dim parseStack() as integer
` **************************************************************************************
` START EXAMPLE
` **************************************************************************************
REM these are only used for displaying this example and
REM are not needed to use the XML functions
#CONSTANT TAG_CATEGORY = "category"
#CONSTANT TAG_DIFF_EASY = "easy"
#CONSTANT TAG_DIFF_MED = "medium"
#CONSTANT TAG_DIFF_HARD = "hard"
#CONSTANT TAG_QUESTION = "question"
#CONSTANT TAG_CHOICE = "choice"
rem load the xml file
xmlReadFile("H:\Programming\Dark Basic\DBPro Source\quiz\quizes\quiz.xml")
rem loop through all elements contained in the xml file.
rem View the xml file to understand the structure of how
rem this data is being outputted to the screen.
for i = 0 to array count(xmlTags())
if xmlGetTagName$(i) = TAG_CATEGORY then print xmlGetAttributeValue$(i, "name")
if xmlGetTagName$(i) = TAG_DIFF_EASY then difficulty$ = " (Easy)"
if xmlGetTagName$(i) = TAG_DIFF_MED then difficulty$ = " (Medium)"
if xmlGetTagName$(i) = TAG_DIFF_HARD then difficulty$ = " (Hard)"
if xmlGetTagName$(i) = TAG_QUESTION then print " "+xmlGetElementContent$(i, FALSE)+difficulty$
if xmlGetTagName$(i) = TAG_CHOICE
a$ = ""
if xmlGetAttributeValue$(i, "answer") = "yes" then a$ = " (correct)"
print " "+xmlGetElementContent$(i, FALSE)+a$
endif
next i
wait key
end
` **************************************************************************************
` END EXAMPLE
` **************************************************************************************
` **************************************************************************************************************
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
`
` These are the 'public' functions the user may use to interact with an XML file.
`
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` **************************************************************************************************************
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function xmlReadFile(xmlFile$)
xmlFileNo = 1
open to read xmlFileNo, xmlFile$
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Read each line of the XML file until it reaches the end
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
while file end(xmlFileNo) = 0
read string xmlFileNo, L$
tagName$ = ""
matchOpenBracket = 0
tagType = 0
strLength = len(L$)
for i = 1 to strLength
c$ = mid$(L$, i)
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ open backet found for new tag
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if c$ = "<"
matchOpenBracket = i
tagType = 0
endif
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ forward slash can either be part of a closing container tag,
` \\ or closing an empty tag
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if c$ = "/"
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ If part of a closing tag, the slash will be prefixed
` \\ by the bracket (less-than sign)
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if oldChar$ = "<"
tagType = 1
endif
endif
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Closing bracket for a tag
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if c$ = ">"
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ if character before closing bracket was a slash,
` \\ then this bracket closed off an empty tag
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if oldChar$ = "/"
tagType = 2
else
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ "<? ?>" is part of the XML declaration
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if oldChar$ = "?"
tagType = 2
else
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` Normal close bracket, standard container element
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
endif
endif
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ If we closed off (completed) the opening tag's bracket,
` \\ then it's open as the current container. Add this tag
` \\ to the container stack for tracking the hierarchy and
` \\ store a new tag element in the array
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if tagType = 0
e as ElementObject
e.pos = matchOpenBracket
temp$ = pSubstr$(L$,matchOpenBracket+1,i-1)
e.tagName = lower$(left$(temp$, pFindTagNameEndIndex(temp$)))
e.unparsedAttributes = right$(temp$, len(temp$)-len(e.tagName))
e.content = ""
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ A parent ID of -1 means it is the root node; no parents
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if array count(parseStack()) < 0
e.parentElementId = -1
else
e.parentElementId = parseStack()
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ The position within the parent tag's content where this
` \\ tag's data shows up
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
e.parentPos = len(xmlTags(e.parentElementId).content)
endif
array insert at bottom xmlTags()
xmlTags() = e
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Add the index of the last tag element added to the tags
` \\ array to the stack. This keeps track of what container
` \\ we're in.
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
add to stack parseStack()
parseStack() = array count(xmlTags())
endif
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Closing tag was found, remove last container from stack
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if tagType = 1
remove from stack parseStack()
endif
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ This was an empty tag element. As they are not containers
` \\ nothing is added to the stack and nothing needs removed.
` \\ Create a new element and add it to the tag array
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if tagType = 2
e as Element
e.tagName = pSubstr$(L$,matchOpenBracket+1,i-2)
e.content = ""
if array count(parseStack()) < 0
e.parentElementId = -1
else
e.parentElementId = parseStack()
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ The position within the parent tag's content where this
` \\ tag's data shows up
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
e.parentPos = len(xmlTags(e.parentElementId).content)
endif
array insert at bottom xmlTags()
xmlTags() = e
endif
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Start the whole process over, the block segment has been closed
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
matchOpenBracket = 0
else
if matchOpenBracket = 0
currentTag = parseStack()
if currentTag > 0 and currentTag <= array count(xmlTags())
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ These checks make sure we don't add any leading spaces
` \\ or tabs to the element content.
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
if len(xmlTags(currentTag).content) > 0
xmlTags(currentTag).content = xmlTags(currentTag).content + c$
else
if asc(c$) <> 32 and asc(c$) <> 9 then xmlTags(currentTag).content = xmlTags(currentTag).content + c$
endif
endif
endif
endif
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Helps keep track of previous characters when checking for
` \\ for slashes, which are used to determine the type of tag
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
oldChar$ = c$
next i
endwhile
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Close the file, we are done with it
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
close file xmlFileNo
endfunction
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Returns the number of tag elements of the given tag name
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function xmlGetTagCount(tagName$)
tagName$ = lower$(tagName$)
count = 0
L = array count(xmlTags())
for i = 0 to L
if xmlGetTagName$(i) = tagName$ then inc count
next i
endfunction count
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Returns the tag name for an xml element
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function xmlGetTagName$(elementId)
t$ = xmlTags(elementId).tagName
endfunction t$
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Returns the value of a specific attribute of a given
` \\ xml element
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function xmlGetAttributeValue$(elementId, attName$)
temp$ = xmlTags(elementId).unparsedAttributes
i = pGetIndexOf(xmlTags(elementId).unparsedAttributes, attName$)
attValue$ = ""
if i = -1 then exitfunction attValue$
j = i+len(attName$)
L = len(temp$)
foundEqual = 0
quoteIndex = 0
for i = j to L
if foundEqual = 0
if mid$(temp$, i) = "=" then foundEqual = 1
else
if asc(mid$(temp$, i)) = 34
if quoteIndex = 0
quoteIndex = i+1
else
attValue$ = pSubstr$(temp$, quoteIndex, i-1)
exit
endif
endif
endif
next i
endfunction attValue$
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Returns the inner content of an xml element. If the
` \\ optional argument 'includeChildContent' is set TRUE, it
` \\ will include the inner content of all child elements from
` \\ within this container as well
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function xmlGetElementContent$(elementId, includeChildContent)
content$ = xmlTags(elementId).content
if includeChildContent = TRUE
extendedLength = 0
for i = 1 to array count(xmlTags())
if xmlTags(i).parentElementId = elementId
content$ = pInsertStr$(content$, xmlTags(i).content, xmlTags(i).parentPos + extendedLength)
extendedLength = extendedLength + len(xmlTags(i).content)
endif
next i
endif
endfunction content$
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ This will remove all elements currently loaded into
` \\ memory. Useful when working with multiple XML files or
` \\ when a loaded XML file is no longer needed.
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function xmlClear()
empty array xmlTags()
endfunction
` **************************************************************************************************************
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
`
` These are what you might call 'private' functions. If you're not familiar with OOP,
` it basically means the user never calls these functions themself, they are strictly
` used internally by 'publicly' accessible XML functions.
`
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` **************************************************************************************************************
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Returns the index of the first occurrence of 'find$'
` \\ within the string 'content$'
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function pGetIndexOf(content$, find$)
J = len(find$)
L = len(content$)-J
find$ = lower$(find$)
for i = 1 to L
temp$ = lower$(pSubstr$(content$, i, i+j-1))
if temp$ = find$ then exitfunction i
next i
endfunction -1
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Returns a substring from 'strSource$' from the starting
` \\ index 'startInc' to the ending index 'endInc'
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function pSubstr$(strSource$, startInc, endInc)
strSource$ = left$(strSource$, endInc)
strSource$ = right$(strSource$,endInc-startInc+1)
endfunction strSource$
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ Inserts 'seg$' into 'strSource$' at string index 'pos'
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function pInsertStr$(strSource$, seg$, pos)
t$ = left$(strSource$, pos)
strSource$ = t$ + seg$ + right$(strSource$,len(strSource$)-len(t$))
endfunction strSource$
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
` \\ This is used when processing a tag's attributes. It finds
` \\ the index to the end of a specific attribute's name
` \\ and content within the string
` \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
function pFindTagNameEndIndex(tagline$)
L = len(tagline$)
for i = 1 to L
if mid$(tagline$, i) = " "
exitfunction i-1
endif
next i
endfunction L
"Only the educated are free" ~Epictetus
"Imagination is more important than knowledge..." ~Einstein