How can I remove non-breaking spaces from a JSoup 'Document'?

老子叫甜甜 提交于 2019-12-09 11:59:34

问题


How can I remove these:

<td>&nbsp;</td>

or

<td width="7%">&nbsp;</td>

from my JSoup 'Document'? I've tried many methods, but these non-breaking space characters do not match anything with normal JSoup expressions or Selectors.


回答1:


The HTML entity &nbsp; (Unicode character NO-BREAK SPACE U+00A0) can in Java be represented by the character \u00a0. Assuming that you want to remove every element which contains that character as own text (and thus not every line as you said in a comment), then the following ought to work:

document.select(":containsOwn(\u00a0)").remove();

If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line.



来源:https://stackoverflow.com/questions/7034775/how-can-i-remove-non-breaking-spaces-from-a-jsoup-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!