selenium with scrapy for dynamic page

后端 未结 2 2223
清酒与你
清酒与你 2020-11-22 06:04

I\'m trying to scrape product information from a webpage, using scrapy. My to-be-scraped webpage looks like this:

  • starts with a product_list page with 10 produ
2条回答
  •  长发绾君心
    2020-11-22 06:29

    It really depends on how do you need to scrape the site and how and what data do you want to get.

    Here's an example how you can follow pagination on ebay using Scrapy+Selenium:

    import scrapy
    from selenium import webdriver
    
    class ProductSpider(scrapy.Spider):
        name = "product_spider"
        allowed_domains = ['ebay.com']
        start_urls = ['http://www.ebay.com/sch/i.html?_odkw=books&_osacat=0&_trksid=p2045573.m570.l1313.TR0.TRC0.Xpython&_nkw=python&_sacat=0&_from=R40']
    
        def __init__(self):
            self.driver = webdriver.Firefox()
    
        def parse(self, response):
            self.driver.get(response.url)
    
            while True:
                next = self.driver.find_element_by_xpath('//td[@class="pagn-next"]/a')
    
                try:
                    next.click()
    
                    # get the data and write it to scrapy items
                except:
                    break
    
            self.driver.close()
    

    Here are some examples of "selenium spiders":

    • Executing Javascript Submit form functions using scrapy in python
    • https://gist.github.com/cheekybastard/4944914
    • https://gist.github.com/irfani/1045108
    • http://snipplr.com/view/66998/

    There is also an alternative to having to use Selenium with Scrapy. In some cases, using ScrapyJS middleware is enough to handle the dynamic parts of a page. Sample real-world usage:

    • Scraping dynamic content using python-Scrapy

提交回复
热议问题