Home >
You can see, that element with id being "v65-product-parent" is of type tableand has subelement tr`.
There can be only one element with such id (otherwise it would be broken xml).
The xpath is expecting tbody as child of given element (table) and there is none in whole page.
This can be tested by
>>> "tbody" in page.text
False
How Chrome came to that XPath?
If you simply download this page by
$ wget http://www.makospearguns.com/product-p/mcffgb.htm
and review content of it, it does not contain a single element named tbody
But if you use Chrome Developer Tools, you find some.
How it comes here?
This often happens, if JavaScript comes into play and generates some page content when in the browser. But as LegoStormtroopr noted, this is not our case and this time it is the browser, which modifies document to make it correct.
How to get content of page dynamically modified within browser?
You have to give some sort of browser a chance. E.g. if you use selenium , you would get it.
byselenium.py
from selenium import webdriver
from lxml import html
url = "http://www.makospearguns.com/product-p/mcffgb.htm"
xpath = '//*[@id="v65-product-parent"]/tbody/tr[2]/td[2]/table[1]/tbody/tr/td/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[1]/div/table/tbody/tr/td/font/div/b/span/text()'
browser = webdriver.Firefox()
browser.get(url)
html_source = browser.page_source
print "test tbody", "tbody" in html_source
tree = html.fromstring(html_source)
text = tree.xpath(xpath)
print text
what prints
$ python byselenimum.py
test tbody True
['$149.95']
Conclusions
Selenium is great when it comes to changes within browser. However it is a bit heavy tool and if you can do it simpler way, do it that way. Lego Stormrtoopr have proposed such a simpler solution working on simply fetched web page.
|