How can I input data into a webpage to scrape the resulting output using Python?

前端 未结 5 1496
予麋鹿
予麋鹿 2020-12-15 12:43

I am familiar with BeautifulSoup and urllib2 to scrape data from a webpage. However, what if a parameter needs to be entered into the page before the result that I want to

相关标签:
5条回答
  • 2020-12-15 13:05

    In addition with the answers already given, you could simply do a request on that page. Using your browser you could always inspect the Network (under Tools/Web Developer tools) behaviors and actions when you interact with the page. E.g. http://www.freemaptools.com/ajax/getaandb.php?a=Florida_Usa&b=New%20York_Usa&c=6052 -> request query for getting the results page you are expecting. Request that page and scrape the field you wanted to. IMHO, page requests are way faster than screen scraping (case-to-case basis).

    But of course, you could always do screen scraping/browser simulation also (Mechanize, Splinter) and use headless browsers (PhantomJS, etc.) or the browser driver of the browser you want to use.

    0 讨论(0)
  • 2020-12-15 13:05

    The query may have been resolved.

    You can use Selenium WebDriver for this purpose. A web page can be interacted using programming language. All the operations can be performed as if a human user is accessing the web page.

    0 讨论(0)
  • 2020-12-15 13:20

    Yes! Try mechanize for this kind of Web screen-scraping task.

    0 讨论(0)
  • 2020-12-15 13:23

    Take a look at tools like mechanize or scrape:

    • http://pypi.python.org/pypi/mechanize
    • http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
    • http://www.ibm.com/developerworks/linux/library/l-python-mechanize-beautiful-soup/

    • http://zesty.ca/scrape/

    Packt Publishing has an article on that matter, too:

    • http://www.packtpub.com/article/web-scraping-with-python
    0 讨论(0)
  • 2020-12-15 13:30

    I think you can also use PySide/PyQt, because they have a browser core of qtwebkit, you can control the browser to open pages, simulate human actions(fill, click...), then scrape data from pages. FMiner is work on this way, it's a web scraping software I developed with PySide.

    Or you can try phantomjs, it's an easy library to control browser, but not it's javascript not python lanuage.

    0 讨论(0)
提交回复
热议问题