Retrieve Amazon Reviews for a particular product

会有一股神秘感。 提交于 2021-02-08 08:08:09

问题


I'm currently working on a research project which need to analyze reviews of a particular product and get an overall idea about the product.

I heard that amazon is a good place to get product reviews/comments. Is there any way to retrieve those user reviews/comments from Amazon via an API?? I tried several python codes but it doesn't work.. Do i need to write a spider if there is no API to retrieve data?

Are there any approaches/places to retrieve user reviews for a given product?


回答1:


www.Scrapehero.com has a great tutorial on how to scrape Amazon product details: How To Scrape Amazon Product Details and Pricing using Python

The complete plain text code they use is ... Products are identified by their ASIN so change the array values to the products you are interested in watching.

from lxml import html  
import csv,os,json
import requests
from exceptions import ValueError
from time import sleep

def AmzonParser(url):
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64)    AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(url,headers=headers)
while True:
    sleep(3)
    try:
        doc = html.fromstring(page.content)
        XPATH_NAME = '//h1[@id="title"]//text()'
        XPATH_SALE_PRICE = '//span[contains(@id,"ourprice") or contains(@id,"saleprice")]/text()'
        XPATH_ORIGINAL_PRICE = '//td[contains(text(),"List Price") or contains(text(),"M.R.P") or contains(text(),"Price")]/following-sibling::td/text()'
        XPATH_CATEGORY = '//a[@class="a-link-normal a-color-tertiary"]//text()'
        XPATH_AVAILABILITY = '//div[@id="availability"]//text()'

        RAW_NAME = doc.xpath(XPATH_NAME)
        RAW_SALE_PRICE = doc.xpath(XPATH_SALE_PRICE)
        RAW_CATEGORY = doc.xpath(XPATH_CATEGORY)
        RAW_ORIGINAL_PRICE = doc.xpath(XPATH_ORIGINAL_PRICE)
        RAw_AVAILABILITY = doc.xpath(XPATH_AVAILABILITY)

        NAME = ' '.join(''.join(RAW_NAME).split()) if RAW_NAME else None
        SALE_PRICE = ' '.join(''.join(RAW_SALE_PRICE).split()).strip() if RAW_SALE_PRICE else None
        CATEGORY = ' > '.join([i.strip() for i in RAW_CATEGORY]) if RAW_CATEGORY else None
        ORIGINAL_PRICE = ''.join(RAW_ORIGINAL_PRICE).strip() if RAW_ORIGINAL_PRICE else None
        AVAILABILITY = ''.join(RAw_AVAILABILITY).strip() if RAw_AVAILABILITY else None

        if not ORIGINAL_PRICE:
            ORIGINAL_PRICE = SALE_PRICE

        if page.status_code!=200:
            raise ValueError('captha')
        data = {
                'NAME':NAME,
                'SALE_PRICE':SALE_PRICE,
                'CATEGORY':CATEGORY,
                'ORIGINAL_PRICE':ORIGINAL_PRICE,
                'AVAILABILITY':AVAILABILITY,
                'URL':url,
                }

        return data
    except Exception as e:
        print e

def ReadAsin():
# AsinList = csv.DictReader(open(os.path.join(os.path.dirname(__file__),"Asinfeed.csv")))
AsinList = ['B0046UR4F4',
'B00JGTVU5A',
'B00GJYCIVK',
'B00EPGK7CQ',
'B00EPGKA4G',
'B00YW5DLB4',
'B00KGD0628',
'B00O9A48N2',
'B00O9A4MEW',
'B00UZKG8QU',]
extracted_data = []
for i in AsinList:
    url = "http://www.amazon.com/dp/"+i
    print "Processing: "+url
    extracted_data.append(AmzonParser(url))
    sleep(5)
f=open('data.json','w')
json.dump(extracted_data,f,indent=4)


if __name__ == "__main__":
ReadAsin()



回答2:


If you need to scrape regularly and go through several product pages, I would suggest to do an addition to the scape hero script and use https://pypi.org/project/fake-useragent/ for the "headers" in request.

Otherwise if it's just once in a while that you need to download the reviews you can use https://reviewi.me . It's a free, web bases tool, that works for several Amazon websites and allows to export in CSV, XLSX and JSON




回答3:


Beautiful Soup is a Python API that will allow you to rip the html data from a page, like an Amazon product page, and parse through the file. This API should allow for the review sections to be taken directly from the page. Here is a link to the documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/



来源:https://stackoverflow.com/questions/41917734/retrieve-amazon-reviews-for-a-particular-product

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!