Why does this xpath fail using lxml in python?

前端 未结 3 1711
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-03 12:56

Here is an example web page I am trying to get data from. http://www.makospearguns.com/product-p/mcffgb.htm

The xpath was taken from chrome development tools, and f

3条回答
  •  余生分开走
    2020-12-03 13:23

    I had a similar issue (Chrome inserting tbody elements when you do Copy as XPath). As others answered, you have to look at the actual page source, though the browser-given XPath is a good place to start. I've found that often, removing tbody tags fixes it, and to test this I wrote a small Python utility script to test XPaths:

    #!/usr/bin/env python
    import sys, requests
    from lxml import html
    if (len(sys.argv) < 3):
         print 'Usage: ' + sys.argv[0] + ' url xpath'
         sys.exit(1)
    else:
        url = sys.argv[1]
        xp = sys.argv[2]
    
    page = requests.get(url)
    tree = html.fromstring(page.text)
    nodes = tree.xpath(xp)
    
    if (len(nodes) == 0):
         print 'XPath did not match any nodes'
    else:
         # tree.xpath(xp) produces a list, so always just take first item
         print (nodes[0]).text_content().encode('ascii', 'ignore')
    

    (that's Python 2.7, in case the non-function "print" didn't give it away)

提交回复
热议问题