Can I parse xpath using python, selenium and lxml?

问题

So I have been trying to figure our how to use BeautifulSoup and did a quick search and found lxml can parse the xpath of an html page. I would LOVE if I could do that but the tutorial isnt that intuitive.

I know how to use Firebug to grab the xpath and was curious if anyone has use lxml and can explain how I can use it to parse specific xpath's, and print them.. say 5 per line..or if it's even possible?!

Selenium is using Chrome and loads the page properly, just need help moving forward.

Thanks!

回答1:

lxml's ElementTree has a .xpath() method (note that the ElementTree in the xml package in the Python distribution dosent have that!)

e.g.

# see http://lxml.de/xpathxslt.html

from lxml import etree

# root = etree.parse('/tmp/stack-overflow-questions.xml')
root = etree.XML('''
        <answers>
            <answer author="dlam" question-id="13965403">AAA</answer>
        </answers>
''')

all_answers = root.xpath('.//answer')

for i, answer in enumerate(all_answers):
    who_answered = answer.attrib['author']
    question_id = answer.attrib['question-id']
    answer_text = answer.text
    print 'Answer #{0} by {1}: {2}'.format(i, who_answered, answer_text)

回答2:

I prefer to use lxml. Because the efficiency of lxml is more higher than selenium for large elements extraction. You can use selenium to get source of webpages and parse the source with lxml's xpath instead of the native find_elements_with_xpath in selenium.

来源：https://stackoverflow.com/questions/13965403/can-i-parse-xpath-using-python-selenium-and-lxml

标签

python

parsing

selenium

lxml

xpath