return all the HtmlPage's HTML

与世无争的帅哥 提交于 2020-01-25 07:27:09

问题


I want the entire HTML for a given HtmlPage object.

What property should I use?


回答1:


In HtmlUnit, an HtmlPage implements the Page interface; that means that you can use Page#getWebResponse() to get the entire web response returned to generate the HtmlPage, and from there it's easy (WebResponse#getContentAsString()). Here's a method that does what you want...

public String getRawPageText(WebClient client, String url)
        throws FailingHttpStatusCodeException, MalformedURLException, IOException {
    HtmlPage page = client.getPage(url);
    return page.getWebResponse().getContentAsString();
}

Or, using an HtmlPage object that you've already fetched:

public String getRawPageText(HtmlPage page) {
    return page.getWebResponse().getContentAsString();
}



回答2:


The quickest way to do this is HtmlPage.asXml -- It may not be perfect, as in, it may not exactly match what you would see if you did "View Source" in a normal browser, but I've found it to be very helpful for developing and debugging HtmlUnit code.



来源:https://stackoverflow.com/questions/2010642/return-all-the-htmlpages-html

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!