Submitting queries to, and scraping results from aspx pages using python?

心已入冬 提交于 2019-12-11 05:27:54

问题


I am trying to get results for a batch of queries to this demographics tools page: http://adlab.microsoft.com/Demographics-Prediction/DPUI.aspx

The POST action on the form calls the same page (_self) and is probably posting some event data. I read on another post here at stackoverflow that aspx pages typically need some viewstate and validation data. Do I simply save these from a request, re-send in a POST request?

Or is there a cleaner way to do this? One of those aspx viewstate parameters is about a 1000 characters and the incredible ugliness of pasting that into my code makes me think there HAS to be a better way. Any and all references to stuff I can read up will be helpful, thanks!


回答1:


Perhaps mechanize may be of use.




回答2:


Use urllib2. Your POST data is a simple Python dictionary. Very easy to edit and maintain.

If your form contains hidden fields -- some of which are encoded -- then you need to do a GET to get the form and the various hidden field seed values.

Once you GET the form, you can add the necessary input values to the given, hidden values and POST the response back again.

Also, you'll have to be sure that you handle any cookies. urllib2 will help with that, also.

After all, that's all a browser does, and it works in a browser. Browser's don't know ASPX from CGI from WSGI, so there's no magic because it's ASPX. You sometimes have to do a GET before a POST to get values and cookies set up properly.




回答3:


I've used a combination requests and BeautifulSoup4 for a similar task.



来源:https://stackoverflow.com/questions/2059822/submitting-queries-to-and-scraping-results-from-aspx-pages-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!