Converting Mac Roman character to equivalent UTF-8

后端 未结 4 587
旧巷少年郎
旧巷少年郎 2021-01-18 06:41

I have been given some HTML files that use the Mac OS Roman file encoding. The files have French text, but in an editor many of the diacritical chars look strange (i.e. non

4条回答
  •  独厮守ぢ
    2021-01-18 07:18

    If your editor isn’t showing it correctly when you specify the encoding, you have given it the wrong encoding. You need to figure what encoding you really have.

    You appear to have a byte valued 0xE9 where you need a Unicode LATIN SMALL LETTER E WITH ACUTE character. A MacRoman 0xE9 byte is a LATIN CAPITAL LETTER E WITH GRAVE character, which is what your editor is displaying because you said it was MacRoman. But it is not.

    However, Unicode code point U+00E9 is indeed LATIN SMALL LETTER E WITH ACUTE.

    Therefore, it is not MacRoman that you have there, but almost certainly ISO-8859-1 or ISO-8859-15.

    So use something like

    $ iconv -f ISO-8859-1 -t UTF-8 < input.latin1 > output.utf8
    

    to do the conversion.

提交回复
热议问题