Replacing unicode punctuation with ASCII approximations

后端 未结 6 987
梦谈多话
梦谈多话 2020-12-01 16:23

I am reading some text files in a Java program and would like to replace some Unicode characters with ASCII approximations. These files will eventually be broken into sente

6条回答
  •  囚心锁ツ
    2020-12-01 16:31

    Here's a Python package that does a good job. It's based on a Perl module Text::Unidecode. I assume this could be ported to Java.

    http://www.tablix.org/~avian/blog/archives/2009/01/unicode_transliteration_in_python/

    http://pypi.python.org/pypi/Unidecode

提交回复
热议问题