Rename ä, ö, ü to ae, oe, ue

橙三吉。 提交于 2019-12-13 00:40:39

问题


We want to rename strings that way that "strange" characters like German umlauts are translated to their official non-umlaut representation. In Java, is there some function to convert such characters (AKA handle the mapping), not only for the German umlauts, but also for French, Czech or Scandinavian characters? The reason is to create a function that could rename files/directories that could be handled without problems on different platforms by Subversion.

This question is similar but without a useful answer.


回答1:


You can use the Unicode block property \p{InCombiningDiacriticalMarks} to remove (most) diacritical marks from Strings:

public String normalize(String input) {
  String output = Normalizer.normalize(input, Normalizer.Form.NFD); 
  Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");

  return pattern.matcher(output).replaceAll("");
}

This will not replace German umlauts the way you desire, though. It will turn ö into o, ä into a and so on. But maybe that's okay for you, too.




回答2:


Use the ICU Transliterator. It is a generic class for performing these kinds of transliterations. You may need to provide your own map.




回答3:


Answer is Any-Latin; De-ASCII; Latin-ASCII;

PHP specific answer using Transliterator (sorry for not providing Java code)

$val = 'BEGIN..Ä..Ö..Ü..ä..ö..ü..ẞ..ß..END';
echo Transliterator::create('Any-Latin; De-ASCII; Latin-ASCII;')->transliterate($val);
// output
//    BEGIN..AE..OE..UE..ae..oe..ue..SS..ss..END

Normal ASCII rule is Any-Latin; Latin-ASCII; (BEGIN..A..O..U..a..o..u..SS..ss..END)

Rules should work in any language with support for ICU = International Components for Unicode.



来源:https://stackoverflow.com/questions/28943843/rename-%c3%a4-%c3%b6-%c3%bc-to-ae-oe-ue

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!