I have the following problem. I am trying to replace german umlauts like ä, ö, ü in java. But it simply does not work. Her
Works fine when I try it, so it must be an encoding issue.
Check your system encoding. You may want to add -encoding UTF-8
to your javac
compiler command line.
-encoding encoding
Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.
If you use Apache Commons or Commons3 in your project, it would be most efficient to use a class like
public class UmlautCleaner {
private static final String[] UMLAUTE = new String[] {"Ä", "Ö", "Ü", "ä", "ö", "ü", "ß"};
private static final String[] UMLAUTE_REPLACEMENT = new String[] {"AE", "OE", "UE", "ae", "oe", "ue", "ss"};
private UmlautCleaner() {
}
public static String cleanSonderzeichen(final String s) {
return StringUtils.stripAccents(StringUtils.replaceEach(s, UMLAUTE, UMLAUTE_REPLACEMENT));
}
}
This finally worked for me:
private static String[][] UMLAUT_REPLACEMENTS = { { new String("Ä"), "Ae" }, { new String("Ü"), "Ue" }, { new String("Ö"), "Oe" }, { new String("ä"), "ae" }, { new String("ü"), "ue" }, { new String("ö"), "oe" }, { new String("ß"), "ss" } };
public static String replaceUmlaute(String orig) {
String result = orig;
for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
result = result.replace(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]);
}
return result;
}
So thanks to all your answers and help. It finally was a mixture of nafas(with the new String) and Joop Eggen(the correct replace-Statement). You got my upvote thanks a lot!
ENCODING ENCODING ENCODING....
Different source of input may result in complications in the String encoding. for example one may have UTF-8
encoding while the other one is ISO
some people suggested that the code works for them, therefore, its most likely that your Strings have different encoding while processed. (different encoding results in different byte array thus no replacing...)
to solve your problem from its root,you must make sure, each of your sources uses exactly same encoding.
try this exercise and it hopefully helps you to solve your problem:
1-try this:
System.out.println(Arrays.asList("Ä".getBytes()); //1 and 2 should have same results
System.out.println(Arrays.asList(new String("Ä","UTF-8").getBytes()); //1 and 2 should have same results
System.out.println(Arrays.asList(new String("Ä","UTF-32").getBytes()); //should have a different results from one and two
System.out.println(Arrays.asList(orig.getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess).
System.out.println(Arrays.asList(new String(orig,"UTF-32").getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess).
the next step is to see how the orgi
string is formed. for example if you have received from web, make sure your POST and GET method are using your preferred encoding
EDIT 1:
try this:
{ { new String("Ä".getBytes(),"UTF-8"), "Ae" }, ... };
if this one didn't work try this:
byte[] bytes = {-61,-124}; //byte representation of Ä in utf-8
String Ae = new String(bytes,"UTF-8");
{ { Ae, "Ae" }, ... }; //and do for the rest
Your code looks fine, replaceAll()
should work as expected.
Try this, if you also want to preserve capitalization (e.g. ÜBUNG
will become UEBUNG
, not UeBUNG
):
private static String replaceUmlaut(String input) {
//replace all lower Umlauts
String output = input.replace("ü", "ue")
.replace("ö", "oe")
.replace("ä", "ae")
.replace("ß", "ss");
//first replace all capital umlaute in a non-capitalized context (e.g. Übung)
output = output.replaceAll("Ü(?=[a-zäöüß ])", "Ue")
.replaceAll("Ö(?=[a-zäöüß ])", "Oe")
.replaceAll("Ä(?=[a-zäöüß ])", "Ae");
//now replace all the other capital umlaute
output = output.replace("Ü", "UE")
.replace("Ö", "OE")
.replace("Ä", "AE");
return output;
}
Source
i had to modify the answer of user1438038:
private static String replaceUmlaute(String output) {
String newString = output.replace("\u00fc", "ue")
.replace("\u00f6", "oe")
.replace("\u00e4", "ae")
.replace("\u00df", "ss")
.replaceAll("\u00dc(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Ue")
.replaceAll("\u00d6(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Oe")
.replaceAll("\u00c4(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Ae")
.replace("\u00dc", "UE")
.replace("\u00d6", "OE")
.replace("\u00c4", "AE");
return newString;
}
This should work on any target platform (i had problems on a tomcat on windows).