问题
While making xpath to extract data out of the below given HTML nodes, I'm unable to extract the corresponding text from corresponding elements within a Div.
<div class="Main">
<div class="Sub">
<div class="Birth">Jack</div>
<span class="Date">
<div><span class="Date">6 June 2018</span></div></span></div>
<div class="Sub">
<div class="Birth">Hurley</div>
<span class="Date"><div><span class="Date">21 June 2011</span></div></span></div>
<div class="Sub">
<div class="Birth">Kate</div>
<span class="Date">
<div><span class="Date">11 May 2013</span></div></span></div>
<div class="Sub">
<div class="Birth">John</div>
<span class="Date">
<div><span class="Date">5 March 2001</span></div></span></div>
What I want is to extract Date text in <div><span class="Date">
against the text in <div class="Birth">
.
The problem in mapping the data extracted data
['Jack','Hurley','Kate','John']
via
xpath('//*[@class="Birth"]/text()').extract()
and
['6 June 2018','21 June 2011','11 May 2013','5 March 2001'] via
xpath('//*[@class="Date"]/text()').extract()
is that they are not necessarily going to be in the same order, thus a relative mapping within the div is required as it can be seen that the name of the div class are same for all segments.
For being sure it must be like, For Text element Kate - Date is 11 May 2013.
回答1:
I'm not sure about siblings but iteration method can work it out as:
for i in range(0, len(list)):
if list_search[i] == "Jack":
Updated = corresponding-value-in-div[i]
break
回答2:
You can first get list of <div class="Sub">
iterate over them and use relative xpath to get elements of each div
Here an example :
subs = response.xpath('//div[@class="Sub"]')
for sub in subs:
print(sub.xpath('.//div[@class="Birth"]/text()').extract_first())
print(sub.xpath('.//div/span[@class="Date"]/text()').extract_first())
This will return that:
Jack
6 June 2018
Hurley
21 June 2011
Kate
11 May 2013
John 5 March 2001
回答3:
Please check the following code and instead of directly giving the name "Jack" you can write another xpath for it also.
response.xpath('//div[contains(text(),"Jack")]//following-sibling::span/div//text()')
来源:https://stackoverflow.com/questions/50966481/how-to-extract-the-corresponding-text-of-a-div-via-xpath