What I do is read in every character and apply the following rules:
if we're not reading in a token (default):
- if char is numeric, change a flag indicating we are reading a number, and store token start position
- if char is alphanumerica, change a flag indicating we are reading a text string, and store token start position
if we are currently reading in a token:
- if we are reading NUMBER and current char is NOT numeric and is NOT a "." then grab the value of the token we have been reading and store it, then set the reading flag to false
- if we are reading TEXT and current char is not alphanumeric, then grab token value and store. Reset reading flag
it may be easier to understand in pseudocode:
isreading = 0
for i = 0 to len(string$)
char$ = mid$(string$,i)
if isreading = 0
if IsNumeric(char$)
tokenstart = i
isreading = 1
tokentype = "num"
endif
else
if tokentype = "num" and char$<>"." and IsNumeric($char)=false
tokenval$ = mid(string$,tokenstart,i-tokenstart
isreading = 0
store token
endif
endif
Same rules can be applied for:
STRING:
starts with: "
ends with: "
TEXT:
starts with: [a-z]
ends with: [a-z,0-9]
OPERATORS:
starts with [=,-,*,/,+] and is stored straight away
ARRAY:
starts with: [a-z]
ends with: [ (as in php's array[key])
It's a pretty effective way to parse, my c# parser can get vary complex
If you want to preserve spaces I suppose you could have a "space" token that's parsed every time you hit a space, and when you go through your tokens afterwards you just skip the space.
Facts are meaningless.
You could use facts to prove anything that's even remotely true.