I\'m trying to scrape product information from a webpage, using scrapy. My to-be-scraped webpage looks like this:
It really depends on how do you need to scrape the site and how and what data do you want to get.
Here's an example how you can follow pagination on ebay using Scrapy
+Selenium
:
import scrapy
from selenium import webdriver
class ProductSpider(scrapy.Spider):
name = "product_spider"
allowed_domains = ['ebay.com']
start_urls = ['http://www.ebay.com/sch/i.html?_odkw=books&_osacat=0&_trksid=p2045573.m570.l1313.TR0.TRC0.Xpython&_nkw=python&_sacat=0&_from=R40']
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
while True:
next = self.driver.find_element_by_xpath('//td[@class="pagn-next"]/a')
try:
next.click()
# get the data and write it to scrapy items
except:
break
self.driver.close()
Here are some examples of "selenium spiders":
There is also an alternative to having to use Selenium
with Scrapy
. In some cases, using ScrapyJS middleware is enough to handle the dynamic parts of a page. Sample real-world usage: