发表新帖

发表新帖

selenium with scrapy for dynamic page

后端未结

关注

 2  2223

清酒与你 2020-11-22 06:04

I\'m trying to scrape product information from a webpage, using scrapy. My to-be-scraped webpage looks like this:

starts with a product_list page with 10 produ

2条回答

长发绾君心 (楼主)

2020-11-22 06:29
It really depends on how do you need to scrape the site and how and what data do you want to get.

Here's an example how you can follow pagination on ebay using Scrapy+Selenium:
```
import scrapy
from selenium import webdriver

class ProductSpider(scrapy.Spider):
    name = "product_spider"
    allowed_domains = ['ebay.com']
    start_urls = ['http://www.ebay.com/sch/i.html?_odkw=books&_osacat=0&_trksid=p2045573.m570.l1313.TR0.TRC0.Xpython&_nkw=python&_sacat=0&_from=R40']

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)

        while True:
            next = self.driver.find_element_by_xpath('//td[@class="pagn-next"]/a')

            try:
                next.click()

                # get the data and write it to scrapy items
            except:
                break

        self.driver.close()
```
Here are some examples of "selenium spiders":
- Executing Javascript Submit form functions using scrapy in python
- https://gist.github.com/cheekybastard/4944914
- https://gist.github.com/irfani/1045108
- http://snipplr.com/view/66998/
There is also an alternative to having to use Selenium with Scrapy. In some cases, using ScrapyJS middleware is enough to handle the dynamic parts of a page. Sample real-world usage:
- Scraping dynamic content using python-Scrapy
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题