HTML speacial character parsing

筅森魡賤 提交于 2019-12-25 01:30:05

问题


I'm looking for a java class to parse all HTML special characters. I guess it's a common problem but i cannot find a fast solution right now.

What i wanto to get is:

input: thè --> output: thè
input: »
input: &lraquo;
...

Do you know anything useful for me?


回答1:


Try the StringEscapeUtils utility class. Check the docs for the StringEscapeUtils.unescapeHtml() method.

Docs here:

http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html

Download here:

http://commons.apache.org/lang/




回答2:


Have you googled on it? The first link on "java HTML markup entity parser" refers to html text extractor

It seems to be what you need.

Also, you may want to examine javax.swing.JLabel's (and another swing text components') renderers.



来源:https://stackoverflow.com/questions/4077892/html-speacial-character-parsing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!