I am using Scrapy to extract some data about musical concerts from websites. At least one website I\'m working with uses (incorrectly, according to W3C - Is it valid to have par
That was quite baffling. To be frank, I still do not get why this is happening. Found out that the tag that should be contained within the tag, is not so. Curl for the site shows of the form , whereas the response obtained from the site shows it as :
\n
Bernard Haitink conducts Brahms and\xa0Dvo\u0159\xe1k featuring\npianist Emanuel Ax
As I mentioned, I do have my doubts but nothing concrete. Anyways, the xpath for getting the text inside tag hence is :
response.xpath('//h1[@class="performance-title"]/following-sibling::p/text()').extract()
This is by using the as a landmark and finding its sibling tag