问题
So I have been trying to figure our how to use BeautifulSoup and did a quick search and found lxml can parse the xpath of an html page. I would LOVE if I could do that but the tutorial isnt that intuitive.
I know how to use Firebug to grab the xpath and was curious if anyone has use lxml and can explain how I can use it to parse specific xpath's, and print them.. say 5 per line..or if it's even possible?!
Selenium is using Chrome and loads the page properly, just need help moving forward.
Thanks!
回答1:
lxml
's ElementTree has a .xpath() method (note that the ElementTree in the xml
package in the Python distribution dosent have that!)
e.g.
# see http://lxml.de/xpathxslt.html
from lxml import etree
# root = etree.parse('/tmp/stack-overflow-questions.xml')
root = etree.XML('''
<answers>
<answer author="dlam" question-id="13965403">AAA</answer>
</answers>
''')
all_answers = root.xpath('.//answer')
for i, answer in enumerate(all_answers):
who_answered = answer.attrib['author']
question_id = answer.attrib['question-id']
answer_text = answer.text
print 'Answer #{0} by {1}: {2}'.format(i, who_answered, answer_text)
回答2:
I prefer to use lxml
. Because the efficiency of lxml
is more higher than selenium
for large elements extraction. You can use selenium
to get source of webpages and parse the source with lxml
's xpath instead of the native find_elements_with_xpath
in selenium
.
来源:https://stackoverflow.com/questions/13965403/can-i-parse-xpath-using-python-selenium-and-lxml