How can I input data into a webpage to scrape the resulting output using Python?

前端未结

关注

 5  1505

I am familiar with BeautifulSoup and urllib2 to scrape data from a webpage. However, what if a parameter needs to be entered into the page before the result that I want to

相关标签:

5条回答

南笙

2020-12-15 13:05

In addition with the answers already given, you could simply do a request on that page. Using your browser you could always inspect the Network (under Tools/Web Developer tools) behaviors and actions when you interact with the page. E.g. http://www.freemaptools.com/ajax/getaandb.php?a=Florida_Usa&b=New%20York_Usa&c=6052 -> request query for getting the results page you are expecting. Request that page and scrape the field you wanted to. IMHO, page requests are way faster than screen scraping (case-to-case basis).

But of course, you could always do screen scraping/browser simulation also (Mechanize, Splinter) and use headless browsers (PhantomJS, etc.) or the browser driver of the browser you want to use.

0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-12-15 13:05

The query may have been resolved.

You can use Selenium WebDriver for this purpose. A web page can be interacted using programming language. All the operations can be performed as if a human user is accessing the web page.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-15 13:20

Yes! Try mechanize for this kind of Web screen-scraping task.

0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-12-15 13:23
Take a look at tools like mechanize or scrape:
- http://pypi.python.org/pypi/mechanize
- http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
- http://www.ibm.com/developerworks/linux/library/l-python-mechanize-beautiful-soup/
- http://zesty.ca/scrape/
Packt Publishing has an article on that matter, too:
- http://www.packtpub.com/article/web-scraping-with-python
0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-15 13:30

I think you can also use PySide/PyQt, because they have a browser core of qtwebkit, you can control the browser to open pages, simulate human actions(fill, click...), then scrape data from pages. FMiner is work on this way, it's a web scraping software I developed with PySide.

Or you can try phantomjs, it's an easy library to control browser, but not it's javascript not python lanuage.

0 讨论(0)
发布评论:

提交评论
- 加载中...