问题
I use Beautiful Soup often to parse HTML files, so when I recently needed to parse an XML file, I chose to use it. However, because I'm parsing an extremely large file, it failed. When researching why it failed, I was led to this question: Loading huge XML files and dealing with MemoryError.
This leads me to my question: If lxml can handle large files and Beautiful Soup cannot, are there any benefits of using Beautiful Soup instead of simply using using lxml directly?
回答1:
If you look at this link about BeautifulSoup Parser:
"BeautifulSoup" is a Python package that parses broken HTML, while "lxml" does so faster but with high quality HTML/XML. So if you're dealing with the first one you're better off with BS... but the advantage of having "lxml" is that you're able to get the soupparser
.
From that link I provided at the top it shows how you can use the capabilities of "BS" with "lxml"
So in the end... you are better off with "lxml".
回答2:
lxml is very fast, and is relatively memory efficient. BeautifulSoup by itself scores less well on the efficiency end, but is built to be compatible with non-standard / broken html and xml, meaning it is ultimately more versatile.
Which you choose to use is really just dependent on your use-case -- web scraping? probably BS. Parsing machine-written structured metadata? lxml is a great choice.
There is also the learning-curve to consider when making the switch - the two systems implement search and navigation strategies in slightly different ways; enough to make learning one system after starting with the other a non-trivial task.
来源:https://stackoverflow.com/questions/31351856/are-there-any-benefits-of-using-beautiful-soup-to-parse-xml-over-using-lxml-alon