Scrapy: scraping a list of links

前端 未结 1 1863
离开以前
离开以前 2021-01-15 12:03

This question is somewhat a follow-up of this question that I asked previously.

I am trying to scrape a website which contains some links on the first page. Somethin

1条回答
  •  温柔的废话
    2021-01-15 12:24

    I'm assuming that the urls you want to follow lead to pages with the same or similar structure. If that's the case, you should do something like this:

    from scrapy.contrib.spiders import CrawlSpider
    from scrapy.selector import Selector
    from scrapy.http import Request
    
    class YourCrawler(CrawlSpider):
    
       name = 'yourCrawler'
       allowed_domains = 'domain.com'
       start_urls = ["htttp://www.domain.com/example/url"]
    
    
       def parse(self, response):
          #parse any elements you need from the start_urls and, optionally, store them as Items.
          # See http://doc.scrapy.org/en/latest/topics/items.html
    
          s = Selector(response)
          urls = s.xpath('//div[@id="example"]//a/@href').extract()
          for url in urls:
             yield Request(url, callback=self.parse_following_urls, dont_filter=True)
    
    
       def parse_following_urls(self, response):
           #Parsing rules go here
    

    Otherwise, if urls you want to follow lead to pages with different structure, then you can define specific methods for them (something like parse1, parse2, parse3...).

    0 讨论(0)
提交回复
热议问题