How to make mechanize wait for web-page 'full' load?

喜欢而已 提交于 2019-12-05 03:00:39
FallenAngel

Working a webpage with a rich javascripts content with mechanize is not much easy, but there are ways to get what you want according to different situations.

  • If some json requests are made to create the content, then you can call that urls and try to parse responses to get content, then try to join it properly.

  • If you need to use some forms, you can create some form fields and set their values within mechanize. Or , simply write a method that will encode your POST or GET data (quote special characters etc..) and send them with mechanize.browser.open method.

  • If page has some javascript based security functions (like some special encoding to form data before posting them), then you may use node.js like javascript application servers to process some javascript code blocks.

But in fact, some of the above options are not easy to do, and you must think twice before using mechanize for such projects.

jcollado

The problem you're having is that the web page is rendered in your web browser through the javascript engine. However, mechanize doesn't have the ability to execute javascript on its own so, no matter how long you wait, you aren't going to get the HTML you're missing using just mechanize.

For more information about how scrape dynamically generated content, please have a look at this question.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!