Using BeautifulSoup to parse my XML
import BeautifulSoup
soup = BeautifulSoup.BeautifulStoneSoup( \"\"\"hello
You are asking what was in the mind of an author, after having noted that he gives names like Beautiful[Stone]Soup to classes/modules :-)
Here are two more examples of the behaviour of BeautifulStoneSoup:
>>> soup = BeautifulSoup.BeautifulStoneSoup(
"""<alan x="y" ><anne>hello</anne>"""
)
>>> print soup.prettify()
<alan x="y">
<anne>
hello
</anne>
</alan>
>>> soup = BeautifulSoup.BeautifulStoneSoup(
"""<alan x="y" ><anne>hello</anne>""",
selfClosingTags=['alan'])
>>> print soup.prettify()
<alan x="y" />
<anne>
hello
</anne>
>>>
My take: a self-closing tag is not legal if it is not defined to the parser. So the author had choices when deciding how to handle an illegal fragment like <alan x="y" /> ... (1) assume that the / was a mistake (2) treat alan as a self-closing tag quite independently of how it might be used elsewhere in the input (3) make 2 passes over the input nutting out in the first pass how each tag was used. Which choice do you prefer?
I don't have a "why", but this might be of interest to you. If you use BeautifulSoup (no Stone) to parse XML with a self-closing tag, it works. Sort of:
>>> soup = BeautifulSoup.BeautifulSoup( """<alan x="y" /><anne>hello</anne>""" ) # selfClosingTags=['alan'])
>>> print soup.prettify()
<alan x="y">
</alan>
<anne>
hello
</anne>
The nesting is right, even if alan is rendered as a start and an end tag.