How to extract the corresponding text of a Div via xpath?

问题

While making xpath to extract data out of the below given HTML nodes, I'm unable to extract the corresponding text from corresponding elements within a Div.

<div class="Main">
    <div class="Sub">
        <div class="Birth">Jack</div>
        <span class="Date">
            <div><span class="Date">6 June 2018</span></div></span></div>
    <div class="Sub">
        <div class="Birth">Hurley</div>
        <span class="Date"><div><span class="Date">21 June 2011</span></div></span></div>
    <div class="Sub">
        <div class="Birth">Kate</div>
        <span class="Date">
            <div><span class="Date">11 May 2013</span></div></span></div>
    <div class="Sub">
        <div class="Birth">John</div>
        <span class="Date">
            <div><span class="Date">5 March 2001</span></div></span></div>

What I want is to extract Date text in <div><span class="Date"> against the text in <div class="Birth">. The problem in mapping the data extracted data ['Jack','Hurley','Kate','John'] via xpath('//*[@class="Birth"]/text()').extract() and ['6 June 2018','21 June 2011','11 May 2013','5 March 2001'] via xpath('//*[@class="Date"]/text()').extract() is that they are not necessarily going to be in the same order, thus a relative mapping within the div is required as it can be seen that the name of the div class are same for all segments. For being sure it must be like, For Text element Kate - Date is 11 May 2013.

回答1:

I'm not sure about siblings but iteration method can work it out as:

for i in range(0, len(list)):
            if list_search[i] == "Jack":
                    Updated = corresponding-value-in-div[i]
                    break

回答2:

You can first get list of <div class="Sub"> iterate over them and use relative xpath to get elements of each div

Here an example :

subs = response.xpath('//div[@class="Sub"]')
for sub in subs:
     print(sub.xpath('.//div[@class="Birth"]/text()').extract_first())
     print(sub.xpath('.//div/span[@class="Date"]/text()').extract_first())

This will return that:

Jack

6 June 2018

Hurley

21 June 2011

Kate

11 May 2013

John 5 March 2001

回答3:

Please check the following code and instead of directly giving the name "Jack" you can write another xpath for it also.

response.xpath('//div[contains(text(),"Jack")]//following-sibling::span/div//text()')

来源：https://stackoverflow.com/questions/50966481/how-to-extract-the-corresponding-text-of-a-div-via-xpath

标签

html

xpath

scrapy