Why does sed fail with International characters and how to fix?

前端 未结 2 2032
陌清茗
陌清茗 2020-12-03 14:28

GNU sed version 4.1.5 seems to fail with International chars. Here is my input file:

Gras Och Stenar Trad - From Moja to Minneapolis DVD [G2007D         


        
2条回答
  •  天命终不由人
    2020-12-03 15:07

    I think the error occurs if the input encoding of the file is different from the preferred encoding of your environment.

    Example: in is UTF-8

    $ LANG=de_DE.UTF-8 sed 's/.*| //' < in
    X
    Y
    $ LANG=de_DE.iso88591 sed 's/.*| //' < in
    X 
    Y
    

    UTF-8 can safely be interpreted as ISO-8859-1, you'll get strange characters but apart from that everything is fine.

    Example: in is ISO-8859-1

    $ LANG=de_DE.UTF-8 sed 's/.*| //' < in
    X
    Gras Och Stenar Trad - From MöY
    $ LANG=de_DE.iso88591 sed 's/.*| //' < in
    X 
    Y
    

    ISO-8859-1 cannot be interpreted as UTF-8, decoding the input file fails. The strange match is probably due to the fact that sed tries to recover rather than fail completely.

    The answer is based on Debian Lenny/Sid and sed 4.1.5.

提交回复
热议问题