Can I get lxml to ignore non-XML content before and after the root tag?
问题 I'm trying to use lxml to process a file that may have some non-xml junk both before and after the XML content, imagine someone captured a terminal buffer and I have something like this: user@host: cat /tmp/log.xml <log> <foo>...</foo> <bar>.. ... </bar> </log> user@host: If I hand etree.parse the filename, it chokes on the beginning content. I can delete the first set of lines until I find a line starting with '<' and hand that to etree.parse, but then it chokes on the closing content. The