get all html as a String from HTMLDocument

坚强是说给别人听的谎言 提交于 2019-12-19 02:52:09

问题


Im coding in Java..

Does anyone know how i can get the content of a javax.swing.text.html.HTMLDocument as a String? This is what i´ve got so far...

URL url = new URL( "http://www.test.com" );

HTMLEditorKit kit = new HTMLEditorKit(); 
HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument(); 
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
Reader HTMLReader = new InputStreamReader(url.openConnection().getInputStream()); 
kit.read(HTMLReader, doc, 0); 

I need the content of the HTMLDocument as a String.

Example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">    <html><head><meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">

....... etc.

Any help would be appreciated. I need to use HTMLDocument class in order for the html to be processed correctly :)

Thanks Daniel


回答1:


StringWriter writer = new StringWriter();
kit.write(writer, doc, 0, doc.getLength());
String s = writer.toString();



回答2:


You don't need the editor and reader at all - just read the input stream. For example, with commons-io IOUtils.toString(inputStream)

or you can use:

Content content = document.getContent();
String str = content.getString(0, content.length() - 1);


来源:https://stackoverflow.com/questions/10472049/get-all-html-as-a-string-from-htmldocument

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!