JSoup: Difficulty extracting a single element

蓝咒 提交于 2020-03-05 04:12:08

问题


For my college coding project, I am tasked with grabbing the live value of bitcoin from the internet and incorporating it into a mini "bitcoin program." The issue is that I am having difficulty extracting the value of bitcoin from certain websites. Any and all help would be greatly appreciated.

I have tried using different websites, with mixed results.

Example 1

    final String url = "https://www.coindesk.com/price/bitcoin";
    try
    {
        Document doc = Jsoup.connect(url).get();
        Element ele = doc.select("span.currency-price").first();
        final String words = ele.text();
        System.out.println(words);
    }
    catch(Exception ex)
    {
        ex.printStackTrace();
    }

Example 2

    final String url = "https://cointelegraph.com/bitcoin-price-index";
    try
    {
        Document doc = Jsoup.connect(url).get();
        Element ele = doc.select("div.price-value").first();
        final String words = ele.text();
        System.out.println(words);
    }
    catch(Exception ex)
    {
        ex.printStackTrace();
    }

Example 1 resulted in a java.lang.NullPointerException at com.mycompany.test.Test.main(Test.java:28)

Example 2 ran without fault.


回答1:


Site https://www.coindesk.com/price/bitcoin relies heavily on JavaScript when presenting content. Jsoup can't execute JavaScript. It can only parse raw HTML documents.
To see what Jsoup sees try to visit this page with JavaScript disabled. You'll see the page is missing main content. Alternatively visit this page and press Ctrl+U to check page source before JavaScript modifications.
Using Chrome's debugger (Network tab) you can see it makes additional AJAX requests to get current exchange rates in JSON from this URL: https://production.api.coindesk.com/v1/exchangeRates
Then JavaScript is used to create dynamic HTML elements for this data. It also requests few other URLs to fetch graph data.




回答2:


Jsoup can not parse this page because of an extra "</div>" in the div with react-app id. you can report a bug like this



来源:https://stackoverflow.com/questions/58020902/jsoup-difficulty-extracting-a-single-element

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!