How to extract source html from webpage?

问题

I am trying to extract the html source of this page, http://www.fxstreet.com/rates-charts/currency-rates/

I want what I see when I save the page from chrome as a .html file.

I tried to do this in java, using bufferedreader, and then using jsoup. I also tried to do it in python, however I keep getting the following message:

"This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser."

The end goal is to extract the values in the main table.

回答1:

Try using HtmlUnit and setting setJavascriptEnabled(true)

Look also at: this and this

JSoup isn't headless browser to execute Javascript so you must choose other library to get the page and then you can use JSoup to parse it.

回答2:

Just to extract the main table can be easily done using Jsoup

here's a method that will take all the content from the main table on the page

public void parse(){
        try{

        Document doc = Jsoup.connect("http://www.fxstreet.com/rates-charts/currency-rates/").get();
        Element content = doc.getElementById("ddlPairsChoose");
        Elements table = doc.getElementsByClass("applet-content");      

        System.out.print(table);

        }

        catch(Exception e){

            System.out.print("error --> " + e);
        }       
    }

It prints out the table on the page

来源：https://stackoverflow.com/questions/10857780/how-to-extract-source-html-from-webpage

标签

java

python

html-parsing

jsoup

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!