How to remove all of the diacritics from a file?

前端 未结 9 1554
Happy的楠姐
Happy的楠姐 2020-12-05 00:10

I have a file containing many vowels with diacritics. I need to make these replacements:

  • Replace ā, á, ǎ, and à with a.
  • Replace ē, é, ě, and è with
相关标签:
9条回答
  • 2020-12-05 00:35

    You can use man iso_8859_1 (or your char set) or od -bc to identify the the octal representation of the diacritic. Then use gawk to do the replacing.

    { gsub(/\344/,"a"; print $0 }
    

    This replaces ä with a.

    0 讨论(0)
  • 2020-12-05 00:39

    For this the tr(1) command is for. For example:

    tr 'āáǎàēéěèīíǐì...' 'aaaaeeeeiii...' <infile >outfile
    

    You may have to check/change your LANG environment variable to match the character set being used.

    0 讨论(0)
  • 2020-12-05 00:40

    This might work for you:

    sed -i 'y/āáǎàēéěèīíǐìōóǒòūúǔùǖǘǚǜĀÁǍÀĒÉĚÈĪÍǏÌŌÓǑÒŪÚǓÙǕǗǙǛ/aaaaeeeeiiiioooouuuuüüüüAAAAEEEEIIIIOOOOUUUUÜÜÜÜ/' file
    
    0 讨论(0)
提交回复
热议问题