How to create/write a simple XML parser from scratch?
Rather than code samples, I want to know what are the simplified, basic steps in English.
How is a good par
for and event based parser the user need to pass it some functions (startNode(name,attrs), endNode(name) and someText(txt) likely through an interface) and call them when needed as you pass over the file
the parser will have a while loop that will alternate between reading until < and until > and do the proper conversions to the parameter types
void parse(EventParser p, File file){
string str;
while((str = file.readln('<')).length !=0){
//not using a rewritable buffer to take advantage of slicing
//but it's a quick conversion to a implementation with a rewritable buffer though
if(str.length>1)p.someText(str.chomp('<'));
str = file.readln('>');
str = str.chomp('>');
//split str in name and attrs
auto parts = str.split();
string name = parts[0];
string[string] attrs;
foreach(attribute;parts[1..$]){
auto splitAtrr = attribute.split("=");
attrs[splitAtrr[0]] = splitAtrr[1];
}
if(str[0] == '/')p.endNode(name);
else {
p.startNode(name,attrs);
if(str[str.length-1]=='/')p.endNode(name);//self closing tag
}
}
}
you can build a DOM parser on top of a event based parser and the basic functionality you'll need for each node is getChildren and getParent getName and getAttributes (with setters when building ;) )
the object for the dom parser with the above described methods:
class DOMEventParser : EventParser{
DOMNode current = new RootNode();
overrides void startNode(string name,string[string] attrs){
DOMNode tmp = new ElementNode(current,name,attrs);
current.appendChild(tmp);
current = tmp;
}
overrides void endNode(string name){
asser(name == current.name);
current = current.parent;
}
overrides void someText(string txt){
current.appendChild(new TextNode(txt));
}
}
when the parsing ends the rootnode will have the root of the DOM tree
note: I didn't put any verification code in there to ensure correctness of the xml
edit: the parsing of the attributes has a bug in it, instead of splitting on whitespace a regex is better for that