Replacing unicode punctuation with ASCII approximations

后端 未结 6 960
梦谈多话
梦谈多话 2020-12-01 16:23

I am reading some text files in a Java program and would like to replace some Unicode characters with ASCII approximations. These files will eventually be broken into sente

6条回答
  •  执笔经年
    2020-12-01 16:31

    What I've done for similar substitutions is create a Map (usually HashMap) with the Unicode characters as the keys and their substitute as the values.

    Pseudo-Java; the for depends on what sort of character container you're using as a parameter to the method that does this, e.g. String, CharSequence, etc.

    StringBuilder output = new StringBuilder();
    for (each Character 'c' in inputString)
    {
        Character replacement = xlateMap.get( c );
        output.append( replacement != null ? replacement : c );
    }
    return output.toString();
    

    Anything in the Map is replaced, anything not in the Map is unchanged and copied to output.

提交回复
热议问题