How to remove hard spaces with Jsoup?

我怕爱的太早我们不能终老 提交于 2019-12-30 08:03:01

问题


I'm trying to remove hard spaces (from   entities in the HTML). I can't remove it with .trim() or .replace(" ", ""), etc! I don't get it.

I even found on Stackoverflow to try with \\u00a0 but didn't work neither.

I tried this (since text() returns actual hard space characters, U+00A0):

System.out.println( "'"+fields.get(6).text().replace("\\u00a0", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().replace(" ", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().trim()+"'"); //'94,00 '
System.out.println( "'"+fields.get(6).html().replace(" ", "")+"'"); //'94,00' works

But I can't figure out why I can't remove the white space with .text().


回答1:


Your first attempt was very nearly it, you're quite right that Jsoup maps   to U+00A0. You just don't want the double backslash in your string:

System.out.println( "'"+fields.get(6).text().replace("\u00a0", "")+"'" ); //'94,00'
// Just one ------------------------------------------^

replace doesn't use regular expressions, so you aren't trying to pass a literal backslash through to the regex level. You just want to specify character U+00A0 in the string.




回答2:


The question has been edited to reflect the true problem.

New answer; The hardspace, ie. entity   (Unicode character NO-BREAK SPACE U+00A0 ) can in Java be represented by the character \u00a0, thus code becomes, where str is the string gotten from the text() method

str.replaceAll ("\u00a0", "");

Old answer; Using the JSoup library,

import org.jsoup.parser.Parser;

String str1 = Parser.unescapeEntities("last week, Ovokerie Ogbeta", false);
String str2 = Parser.unescapeEntities("Entered » Here", false);
System.out.println(str1 + " " + str2);

Prints out:

last week, Ovokerie Ogbeta Entered » Here 


来源:https://stackoverflow.com/questions/21137892/how-to-remove-hard-spaces-with-jsoup

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!