How to iterate over divs in Scrapy?

问题

It is propably very trivial question but I am new to Scrapy. I've tried to find solution for my problem but I just can't see what is wrong with this code.

My goal is to scrap all of the opera shows from given website. Data for every show is inside one div with class "row-fluid row-performance ". I am trying to iterate over them to retrieve it but it doesn't work. It gives me content of the first div in each iteration(I am getting 19x times the same show, instead of different items).

Thanks for any advice!

import scrapy
from ..items import ShowItem

class OperaSpider(scrapy.Spider):
    name = "opera"
    allowed_domains = ["http://www.opera.krakow.pl"]
    start_urls = [
        "http://www.opera.krakow.pl/pl/repertuar/na-afiszu/listopad"

    ]


    def parse(self, response):
        divs = response.xpath('//div[@class="row-fluid row-performance    "]')
        for div in divs:
            item= ShowItem()
            item['title'] = div.xpath('//h2[@class="item-title"]/a/text()').extract()
            item['time'] = div.xpath('//div[@class="item-time vertical-center"]/div[@class="vcentered"]/text()').extract()
            item['date'] = div.xpath('//div[@class="item-date vertical-center"]/div[@class="vcentered"]/text()').extract()
            yield item

回答1:

Try to change the xpaths inside the for loop to start with .//. That is, just put a dot in front of the double backslash. You can also try using extract_first() instead of extract() and see if that gives you better results.

来源：https://stackoverflow.com/questions/47399985/how-to-iterate-over-divs-in-scrapy

标签

python

web-scraping

scrapy