Scraping AJAX e-commerce site using python

亡梦爱人 提交于 2019-12-04 01:59:51

问题


I have a problem on scraping an e-commerce site using BeautifulSoup. I did some Googling but I still can't solve the problem.

Please refer on the pictures:

1 Chrome F12 :

2 Result :

Here is the site that I tried to scrape: "https://shopee.com.my/search?keyword=h370m"

Problem:

  1. When I tried to open up Inspect Element on Google Chrome (F12), I can see the for the product's name, price, etc. But when I run my python program, I could not get the same code and tag in the python result. After some googling, I found out that this website used AJAX query to get the data.

  2. Anyone can help me on the best methods to get these product's data by scraping an AJAX site? I would like to display the data in a table form.

My code:

import requests
from bs4 import BeautifulSoup
source = requests.get('https://shopee.com.my/search?keyword=h370m')
soup = BeautifulSoup(source.text, 'html.parser')
print(soup)

回答1:


Welcome to StackOverflow! You can inspect where the ajax request is being sent to and replicate that.

In this case the request goes to this api url. You can then use requests to perform a similar request. Notice however that this api endpoint requires a correct UserAgent header. You can use a package like fake-useragent or just hardcode a string for the agent.

import requests

# fake useragent
from fake_useragent import UserAgent
user_agent = UserAgent().chrome

# or hardcode
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1468.0 Safari/537.36'

url = 'https://shopee.com.my/api/v2/search_items/?by=relevancy&keyword=h370m&limit=50&newest=0&order=desc&page_type=search'
resp = requests.get(url, headers={
    'User-Agent': user_agent
})
data = resp.json()
products = data.get('items')



回答2:


Welcome to StackOverflow! :)

As an alternative, you can check Selenium

See example usage from documentation:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()

When you use requests (or libraries like Scrapy) usually JavaScript not loaded. As @dmitrybelyakov mentioned you can reply these calls or imitate normal user interaction using Selenium.



来源:https://stackoverflow.com/questions/54401612/scraping-ajax-e-commerce-site-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!