How to save a Jsoup Document to an HTML file?

ε祈祈猫儿з 提交于 2019-12-04 22:47:38

The fact that there are elements that are ignored, must be due to the attempt of normalization by Jsoup.

In order to get the server's exact output without any form of normalization use this.

Connection.Response html = Jsoup.connect("PUT_URL_HERE").execute();
System.out.println(html.body());

Use doc.outerHtml().

import org.apache.commons.io.FileUtils;

public void downloadPage() throws Exception {
        final Response response = Jsoup.connect("http://www.example.net").execute();
        final Document doc = response.parse();

        final File f = new File("filename.html");
        FileUtils.writeStringToFile(f, doc.outerHtml(), "UTF-8");
    }

Don't forget to catch Exceptions. Add dependency or download Apache commons-io library for easy and quick way to saving files in UTF-8 format.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!