How to scrape ID-less website elements with XPath-only regex patterns

空扰寡人 提交于 2019-12-13 03:57:28

问题


There are several similar questions related to the usage of regex in XPath searches -- However, some are not very illuminating to me, whereas others failed for my specific problem. Therefore and for future users that might come across the same, I post the following question:

Using one call in Python/Selenium, I want to be able to scrape all elements below at once (for readability without code formatting):

/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**1**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**2**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**3**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**4**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**5**]/div/div[2]/div[1]
/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[**6**]/div/div[2]/div[1]

Note that the number of matching elements is variable among target websites (can be more than 6, but at least one) and that the associated elements do not have a specific ID assigned (which excludes many solutions explained elsewhere on StackOverflow, according to my understanding).

What I am looking for is something like:

website = driver.get(URL)
html = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.XPATH, "/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[[0-9]{1}]/div/div[2]/div[1]", regex = True)))

What doesn't work is:

website = driver.get(URL)
html = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.XPATH, "/html/body/div[6]/div/div[1]/div/div[3]/div[2]/div[2]/div[matchers['[0-9]{1}']]/div/div[2]/div[1]")))
TimeoutException: Message: 
Screenshot: available via screen

How to scrape all website elements without ID whose XPath matches a regex pattern in Python + Selenium?


回答1:


You don't want a regex for this, you want the predicate [position()<=6].



来源:https://stackoverflow.com/questions/48144574/how-to-scrape-id-less-website-elements-with-xpath-only-regex-patterns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!