java html parser for reading javascript generated contents

好久不见. 提交于 2019-12-07 18:40:18

问题


I am using jsoup for reading a web page by the following function.

public Document getDocuement(String url){
        Document doc = null;
        try {
            doc = Jsoup.connect(url).timeout(20*1000).userAgent("Mozilla").get();
        } catch (Exception e) {
            return null;
        }
        return doc;
    }

But whenever i am trying to read a web page that contain javascript generated contents, jsoup does not read those contents. ie, the actual content of the page is loading by some javascript calls.So it is not present in the page source of that link. For example, this blog: http://blog.rapporter.net/search/label/r. Is there a way to get also javascript generated content when parsing page with Jsoup? If no please suggest any java html parser that can solve this problem..


回答1:


You cannot do this with Jsoup. Jsoup parses HTML, to wait for AJAX requests or JavaScript content in general you would need a browser which could execute this JavaScript in order to get some output from it. JavaScript logic can be complex, so executing JavaScript and loading content is not a trivial thing (just take a look at how complicated browsers, JS and the DOM are).



来源:https://stackoverflow.com/questions/23510172/java-html-parser-for-reading-javascript-generated-contents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!