问题
Over the past year or so I have created a number of scripts to scrape Android app reviews from Google Play. In the past this was working fine by mimicking the Google Play interface to call https://play.google.com/store/getreviews with the necessary parameters and parse the HTML results.
The recent updates to the Google Play interface changed the HTML structure, but also seems to implement some kind of protection against scraping. There is now a "token" parameter which changes, presumably some kind of session ID, and which I have not been able to generate as I'm not sure of what seeds it. Also I've found that it seems to block requesting clients that make multiple calls that don't conform to the interface, as after an unsuccessful call I can't even load the Google Play interface in any browser. After a while this seems to time out. Not certain of this, but it's what I've concluded from what I'm seeing.
Anyone found this similar problem, and found a way around it?
Thanks
回答1:
Give this a try: www.scrape4me.com
It does show an error but it outpouts content:
http://scrape4me.com/api?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.com2us.golfstarworldtour.normal.freefull.google.global.android.common&elm=&ch=ch
来源:https://stackoverflow.com/questions/18482660/google-play-review-scraping-changes