Scraping all mobiles of Flipkart.com

这一生的挚爱 提交于 2020-01-03 19:34:35

问题


I am trying to scrape all the mobiles from www.flipkart.com. Now, what I have thought of doing is that I can scrape all mobiles from here.

http://www.flipkart.com/mobiles/pr?p[]=sort%3Dprice_asc&sid=tyy%2C4io&layout=grid 

Now, the problem is that, in this website I have to press 'show more results' to see more results. But, how can I do this using code? I am using BeautifulSoup package in python.

My code till now:

import bs4
import re
import urllib2
import sys

link = 'http://www.flipkart.com/mobiles/pr?p[]=sort%3Dprice_asc&sid=tyy%2C4io&layout=grid'
response = urllib2.urlopen(link)
thePage = response.read()
soup = bs4.BeautifulSoup(thePage)
allMobiles = soup.find('div', attrs={'id': 'products'})

I only get the first page in the output? How can I access the other pages?


回答1:


You can play around with the get parameters. The regular URL is:

http://www.flipkart.com/mobiles/pr?p[]=sort%3Dprice_asc&sid=tyy%2C4io&layout=grid

Once you hit the 'more results' button (or scroll down) the next page is loaded using AJAX with the following url:

http://www.flipkart.com/mobiles/pr?p%5B%5D=sort%3Dprice_asc&sid=tyy%2C4io&layout=grid&start=41&ajax=true

The url consists of the following parts:

  • path: http://www.flipkart.com/mobiles/pr
  • querystring:
    • p[]: sort=price_asc
    • sid: tyy,4io
    • layout: grid
    • start: 41
    • ajax: true

If you want all phones, just increase the 'start' argument. Something like this:

item_count = 600
for i in range(0, item_count, 40):
    link = "http://www.flipkart.com/mobiles/pr?p%5B%5D=sort%3Dprice_asc&sid=tyy%2C4io&layout=grid&ajax=true&start=%d" % (i+1)

    // Do something with the link
    print link

Enjoy, Wout



来源:https://stackoverflow.com/questions/13775742/scraping-all-mobiles-of-flipkart-com

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!