What characters do not directly map from Cp1252 to UTF-8?

前端 未结 2 1222
执笔经年
执笔经年 2020-12-28 18:32

I\'ve read in several stackoverflow answers that some characters do not directly map (or are even \"unmappable\") when converting from Cp1252 (aka Windows-1252; they\'re the

2条回答
  •  我在风中等你
    2020-12-28 19:22

    Can someone please shed some more light on this?

    The cp1252 decoding function is mostly an identity function.

    cp1252    UCP       (UCP = Unicode Code Point)
    --------  --------
    21        21 (!)    (All numbers in hex)
    31        31 (1)
    41        41 (A)
    

    This makes it seem like something expecting UCP (not UTF-8) will also accept cp1252. The author of the linked Answer is pointing out that this is not the case.

    cp1252    UCP
    --------  --------
    80        20AC (€)
    85        2026 (…)
    99        2122 (™)
    

    The exceptions are all found between 80 and 9F, inclusive.

    Something that accepts UCP will also accept iso-8859-1, but not cp1252.


    Does that mean that if I batch/mass convert source code from cp1252 to utf-8 I'll get some characters that will end up as garbage?

    No. Every character in cp1252 maps to a Unicode Code, so it can successfully be converted to UTF-8 using a proper tool.

提交回复
热议问题