I\'ve read in several stackoverflow answers that some characters do not directly map (or are even \"unmappable\") when converting from Cp1252 (aka Windows-1252; they\'re the
Can someone please shed some more light on this?
The cp1252 decoding function is mostly an identity function.
cp1252 UCP (UCP = Unicode Code Point)
-------- --------
21 21 (!) (All numbers in hex)
31 31 (1)
41 41 (A)
This makes it seem like something expecting UCP (not UTF-8) will also accept cp1252. The author of the linked Answer is pointing out that this is not the case.
cp1252 UCP
-------- --------
80 20AC (€)
85 2026 (…)
99 2122 (™)
The exceptions are all found between 80 and 9F, inclusive.
Something that accepts UCP will also accept iso-8859-1, but not cp1252.
Does that mean that if I batch/mass convert source code from cp1252 to utf-8 I'll get some characters that will end up as garbage?
No. Every character in cp1252 maps to a Unicode Code, so it can successfully be converted to UTF-8 using a proper tool.