Extracting p within h1 with Python/Scrapy

前端未结

关注

 2  1915

夕颜 2021-01-28 18:02

I am using Scrapy to extract some data about musical concerts from websites. At least one website I\'m working with uses (incorrectly, according to W3C - Is it valid to have par

2条回答

逝去的感伤 (楼主)

2021-01-28 18:15
That was quite baffling. To be frank, I still do not get why this is happening. Found out that the
tag that should be contained within the
tag, is not so. Curl for the site shows of the form , whereas the response obtained from the site shows it as :
```
\n
Bernard Haitink conducts Brahms and\xa0Dvo\u0159\xe1k featuring\npianist Emanuel Ax
```
As I mentioned, I do have my doubts but nothing concrete. Anyways, the xpath for getting the text inside
tag hence is :
```
response.xpath('//h1[@class="performance-title"]/following-sibling::p/text()').extract()
```
This is by using the
as a landmark and finding its sibling
tag
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

Extracting p within h1 with Python/Scrapy

\n

as a landmark and finding its sibling tag

as a landmark and finding its sibling
tag