Scrape Products from scrapy and follow pagination

徘徊边缘 提交于 2019-12-24 01:18:20

问题


I am trying to scrape data using scrapy from Alibaba Agriculture and Growing Media Category. You can Click Here to see view the page.

The data I want to scrape from the page are Product_name, Price, Min_order, Company Name, Url of image.

The picture shows what I want to scrape

My Python code

# -*- coding: utf-8 -*-
import scrapy


class AlibabaSpider(scrapy.Spider):
    name = 'alibaba'
    allowed_domains = ['alibaba.com']
    start_urls = ['https://www.alibaba.com/catalog/agricultural-growing-media_cid144?spm=a2700.9161164.1.2.4a934e02VlSXiW']

def parse(self, response):
    for products in response.xpath('.//div[contains(@class, "m-gallery-product-item-wrap")]/div/div'):
        item = {
            'product_name': products.xpath('.//h2/a/@title').extract_first(),
            'price':  products.xpath('(.//div[@class="price"]/b/text())').extract_first().strip(),
            'min_order': products.xpath('.//div[@class="min-order"]/b/text()').extract_first(),
            'company_name': products.xpath('.//div[@class="stitle util-ellipsis"]/a/@title').extract_first(),
            'prod_detail_link': products.xpath('.//div[@class="item-img-inner"]/a/@href').extract_first()
            #'response_rate': products.xpath('.//i[@class="ui2-icon ui2-icon-skip"]/text()').extract_first(),
            #'image_url': products.xpath('.//div[@class=""]/').extract_first(),
         }
        yield item

Problems

  • This code scraped only 21 out of 36 item from the page
  • How to follow the pagination link?

How can you help

  • Modify the code so that all the data is scraped from the page.
  • Modify the code to follow the pagination and keep scraping.

来源:https://stackoverflow.com/questions/52235721/scrape-products-from-scrapy-and-follow-pagination

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!