xpath lookup via lxml starting from root rather than element

问题

I want to do the same thing I do in beautiful soup, find_all elements and iterate through them to find some other_elements in each iterated elements. i.e.:

soup = bs4.BeautifulSoup(source)
articles = soup.find_all('div', class='v-card')
for article in articles:
    name = article.find('span', itemprop='name').text
    address = article.find('p', itemprop='address').text

Now I try to do the same thing in lxml:

tree = html.fromstring(source)
items = tree.xpath('//div[@class="v-card"]')
for item in items:
    name = item.xpath('//span[@itemprop="name"]/text()')
    address = item.xpath('//p[@itemprop="address"]/text()')

...but this finds all matches in the tree, regardless of whether they are under the current item. How can I approach this?

回答1:

Don't use // as prefix in the follow-up queries, which explicitly asks the query to start from the root rather than your current element. Instead, use .// for relative queries:

for item in tree.xpath('//div[@class="v-card"]'):
    name = item.xpath('.//span[@itemprop="name"]/text()'
    address = item.xpath('.//p[@itemprop="address"]/text()')

来源：https://stackoverflow.com/questions/24787054/xpath-lookup-via-lxml-starting-from-root-rather-than-element

标签

python

xpath

web-scraping

lxml

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!