Are there any benefits of using Beautiful Soup to parse XML over using lxml alone?

问题

I use Beautiful Soup often to parse HTML files, so when I recently needed to parse an XML file, I chose to use it. However, because I'm parsing an extremely large file, it failed. When researching why it failed, I was led to this question: Loading huge XML files and dealing with MemoryError.

This leads me to my question: If lxml can handle large files and Beautiful Soup cannot, are there any benefits of using Beautiful Soup instead of simply using using lxml directly?

回答1:

If you look at this link about BeautifulSoup Parser:

"BeautifulSoup" is a Python package that parses broken HTML, while "lxml" does so faster but with high quality HTML/XML. So if you're dealing with the first one you're better off with BS... but the advantage of having "lxml" is that you're able to get the soupparser.

From that link I provided at the top it shows how you can use the capabilities of "BS" with "lxml"

So in the end... you are better off with "lxml".

回答2:

lxml is very fast, and is relatively memory efficient. BeautifulSoup by itself scores less well on the efficiency end, but is built to be compatible with non-standard / broken html and xml, meaning it is ultimately more versatile.

Which you choose to use is really just dependent on your use-case -- web scraping? probably BS. Parsing machine-written structured metadata? lxml is a great choice.

There is also the learning-curve to consider when making the switch - the two systems implement search and navigation strategies in slightly different ways; enough to make learning one system after starting with the other a non-trivial task.

来源：https://stackoverflow.com/questions/31351856/are-there-any-benefits-of-using-beautiful-soup-to-parse-xml-over-using-lxml-alon

标签

python

xml

beautifulsoup

lxml