Is there any Python module that helps to crawl data from DOM loaded by Javascript?

杀马特。学长 韩版系。学妹 提交于 2019-12-13 12:38:43

问题


I want to scrape data from a page which loads DOM elements using Ajax call.

I have tried with the old solution line PyQt4-based scraping, which loads the DOM after it's fully loaded, but the problem is that I need to do a POST request and it's only available for GET.

The new Python module ghost.py has time out issues: when it fetches a large DOM tree it raises a time out exception.

If anyone knows any specific way or tools that can help me to do a POST request and grab the data after fully loaded DOM, that will help me a lot.


回答1:


You can use Selenium to automate browser and access dom. Selenium has python driver hence you can write code in python to navigate to the page. click buttons and wait for ajax call to complete before you start scrapping.




回答2:


For emulating Javascript and automate browser, I recommend `Spynner. You can run it with or without a Xserver and the syntax is quite simple to use. You can load jquery too.



来源:https://stackoverflow.com/questions/10360817/is-there-any-python-module-that-helps-to-crawl-data-from-dom-loaded-by-javascrip

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!