How to extract the corresponding text of a Div via xpath?

余生颓废 提交于 2021-01-29 09:41:11

问题


While making xpath to extract data out of the below given HTML nodes, I'm unable to extract the corresponding text from corresponding elements within a Div.

<div class="Main">
    <div class="Sub">
        <div class="Birth">Jack</div>
        <span class="Date">
            <div><span class="Date">6 June 2018</span></div></span></div>
    <div class="Sub">
        <div class="Birth">Hurley</div>
        <span class="Date"><div><span class="Date">21 June 2011</span></div></span></div>
    <div class="Sub">
        <div class="Birth">Kate</div>
        <span class="Date">
            <div><span class="Date">11 May 2013</span></div></span></div>
    <div class="Sub">
        <div class="Birth">John</div>
        <span class="Date">
            <div><span class="Date">5 March 2001</span></div></span></div>

What I want is to extract Date text in <div><span class="Date"> against the text in <div class="Birth">. The problem in mapping the data extracted data ['Jack','Hurley','Kate','John'] via xpath('//*[@class="Birth"]/text()').extract() and ['6 June 2018','21 June 2011','11 May 2013','5 March 2001'] via xpath('//*[@class="Date"]/text()').extract() is that they are not necessarily going to be in the same order, thus a relative mapping within the div is required as it can be seen that the name of the div class are same for all segments. For being sure it must be like, For Text element Kate - Date is 11 May 2013.


回答1:


I'm not sure about siblings but iteration method can work it out as:

for i in range(0, len(list)):
            if list_search[i] == "Jack":
                    Updated = corresponding-value-in-div[i]
                    break



回答2:


You can first get list of <div class="Sub"> iterate over them and use relative xpath to get elements of each div

Here an example :

subs = response.xpath('//div[@class="Sub"]')
for sub in subs:
     print(sub.xpath('.//div[@class="Birth"]/text()').extract_first())
     print(sub.xpath('.//div/span[@class="Date"]/text()').extract_first())

This will return that:

Jack

6 June 2018

Hurley

21 June 2011

Kate

11 May 2013

John 5 March 2001




回答3:


Please check the following code and instead of directly giving the name "Jack" you can write another xpath for it also.

response.xpath('//div[contains(text(),"Jack")]//following-sibling::span/div//text()')


来源:https://stackoverflow.com/questions/50966481/how-to-extract-the-corresponding-text-of-a-div-via-xpath

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!