Raw HTML vs. DOM scraping in python using mechanize and beautiful soup

扶醉桌前 提交于 2019-12-04 15:03:34
Yugal Jindle

Mechanize and Beautiful soup are un-beatable tools web-scrapping in python.

But you need to understand what is meant for what:

Mechanize : It mimics the browser functionality on a webpage.

BeautifulSoup : HTML parser, works well even when HTML is not well-formed.

Your problem seems to be javascript. The price is getting populated via an ajax call using javascript. Mechanize, however, does not do javascript, so any content that results from javascript will remain invisible to mechanize.

Take a look at this : http://github.com/davisp/python-spidermonkey/tree/master

This does a wrapper on mechanize and Beautiful soup with js execution.

Answering my own question because in the years since asking this I have learned a lot. Today I would use Selenium Webdriver to do this job. Selenium is exactly the tool I was looking for back in 2012 for this type of web scraping project.

https://www.seleniumhq.org/download/

http://chromedriver.chromium.org/

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!