Extracting p within h1 with Python/Scrapy

前端 未结 2 1915
夕颜
夕颜 2021-01-28 18:02

I am using Scrapy to extract some data about musical concerts from websites. At least one website I\'m working with uses (incorrectly, according to W3C - Is it valid to have par

2条回答
  •  逝去的感伤
    2021-01-28 18:15

    That was quite baffling. To be frank, I still do not get why this is happening. Found out that the

    tag that should be contained within the

    tag, is not so. Curl for the site shows of the form

    , whereas the response obtained from the site shows it as :

    \n

    Bernard Haitink conducts Brahms and\xa0Dvo\u0159\xe1k featuring\npianist Emanuel Ax

    As I mentioned, I do have my doubts but nothing concrete. Anyways, the xpath for getting the text inside

    tag hence is :

    response.xpath('//h1[@class="performance-title"]/following-sibling::p/text()').extract()
    

    This is by using the

    as a landmark and finding its sibling

    tag

提交回复
热议问题