Parsing web page containing dynamic javascript objects

霸气de小男生 提交于 2019-12-13 00:23:32

问题


Currently I'm using python and its urllib2, urllib to retrieve a simple static web page. Everything was smooth until web-page developers added java scripts. Now the most interesting information is hidden behind the scripts:

<a href="javascript://" class="event-more-view" id="view-moreid-12311" onclick="Markets.applyView(this);return false;" treeid="1291266" eventstate ="false" > add table </a>

Browser preloads data and shows it when the "a href" link is clicked. The results of my short research are JSOUP and HTMLunit. Am I digging in a right direction? Any cons and pros?

Will python help? Should I be using Java? What packages can help with dynamic content? What is simpler?

In my case I have to create some sort of a virtual browser as far as built-in scripts refresh data over time which has to be processed.


回答1:


You are digging in a right direction.

Here are some options/tools to consider:

  • ghost.py
  • htmlunit under jython
  • selenium

See also:

  • Click on a javascript link within python?
  • Simulating clicking on a javascript link in python

Hope that helps.



来源:https://stackoverflow.com/questions/17424385/parsing-web-page-containing-dynamic-javascript-objects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!