问题
Suppose we have web page
<div class="specific-row" data-id="101736782"></div>
<div class="yellow-box-row" data-id="112376244"></div>
<div class="specific-row" data-id="179218312"></div>
<div class="vip-row" data-id="123749014"></div>
How can I get all data-id values?
Like ['101736782', '112376244', '179218312', '123749014']
I used tree.xpath
import requests
from lxml import html
r = requests.get(url)
tree = html.fromstring(r.content)
tree.xpath("//div@data-id=['any']")
回答1:
I try this...
from lxml import etree, html
doc = '<root><div class="specific-row" data-id="101736782"></div><div class="yellow-box-row" data-id="112376244"></div><div class="specific-row" data-id="179218312"></div><div class="vip-row" data-id="123749014"></div></root>'
root = etree.XML(doc) # EQUALS TO >>> root = html.fromstring(doc)
xpatheval = etree.XPathEvaluator(root)
divs = xpatheval('//div')
ids = [el.get('data-id') for el in divs]
## If you have installed cssselect you can do
divs = root.cssselect('[data-id]')
ids = [el.get('data-id') for el in divs]
# (cssselect) use the same schema of selection of 'some_element_node.querySelector("data-id")' of browsers
# Maybe this is what you are looking for -- https://lxml.de/tutorial.html#elementpath
root.findall('div[@data-id]')
I use this link to help me.
来源:https://stackoverflow.com/questions/61494994/get-all-values-of-specific-key-with-xpath-python-web-scraping