Java parsing html elements generated by JS

后端 未结 1 613
不知归路
不知归路 2020-12-20 06:45

I\'m very new to html parsing with Java, I used JSoup previously to parse simple html without it dynamically changing, however I now need to parse a web page that has dynami

相关标签:
1条回答
  • 2020-12-20 07:27

    The problem you are facing is Jsoup retrieves the static source code, as it would be delivered to a browser. What you want is the DOM after the javaScript has been invoked. For this, you can use HTML Unit to get the rendered page and then pass its content to Jsoup for parsing.

    // capture rendered page
    WebClient webClient = new WebClient();
    HtmlPage myPage = webClient.getPage("https://pokevision.com");
    
    // convert to jsoup dom
    Document doc = Jsoup.parse(myPage.asXml());
    
    // extract data using jsoup selectors
    Elements images = doc.select("img[src~=(?i)\\.(png|jpe?g|gif)]");
    for (Element image : images) {
        System.out.println("src : " + image.attr("src"));
    }
    
    // clean up resources
    webClient.close();
    
    0 讨论(0)
提交回复
热议问题