Character Set Special Characters

て烟熏妆下的殇ゞ 提交于 2019-12-03 14:59:39

Is iso-8859-1 a proper subset of utf-8?

The character reportoire of ISO-8859-1 (the first 256 characters of Unicode) is a proper subset of that of UTF-8 (every Unicode character).

However, the characters U+0080 to U+00FF are encoded differently in the two encodings.

  • ISO-8859-1 assigns each of these characters a single byte from 80 to FF.
  • UTF-8 encodes the same characters as two-byte sequences C2 80 to C3 BF.

What about iso-8859-n?

These are 15 different encodings that contain a total of 614 distinct characters. Some of these characters occur in multiple "parts" of ISO 8859, and some don't. You'll have to be more specific.

I see that your question is tagged ISO-8859-2. The characters that are in -2 that aren't in -1 are:

Ă㥹ĆćČčĎďĐđĘęĚěĹ弾ŁłŃńŇňŐőŔŕŘřŚśŞşŠšŢţŤťŮůŰűŹźŻżŽžˇ˘˙˛˝

What about windows-1252?

Windows-1252 is just like ISO-8859-1 except that it replaces the rarely used control characters in the 0x80-0x9F range with printable characters. The characters that are in windows-1252 but not in ISO-8859-1 are:

ŒœŠšŸŽžƒˆ˜–—‘’‚“”„†‡•…‰‹›€™

Unicode is a superset of all these character sets, and of pretty much all established character sets out there. You can find a list of mappings of all these character sets to Unicode code points here: http://unicode.org/Public/MAPPINGS/.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!