UTF-8 or UTF-16 or UTF-32 or UCS-2

后端 未结 6 1882
陌清茗
陌清茗 2020-12-03 08:53

I am designing a new CMS but want to design it to fit all my future needs like Multilingual content so i was thinking Unicode (UTF-8) is the best solution

But with

6条回答
  •  醉话见心
    2020-12-03 09:24

    Quick note: basically everything can be represented in the unicode character set. UTF-8 is just one encoding that's able to represent all of the characters in this set.

    UCS-2 is not really a thing to use anymore. It can't hold characters beyond U+FFFF.

    Which of the remaining three depends on what kind of operations you want to do on the text. UTF-8 (usually, not always!) will take up less space on disk representing the same data, and is a strict superset of ASCII, so it might reduce the amount of transcoding needed. However, you can't index your string or find its length in constant time.

    UTF-32 does allow you to find the length of the string and index it in constant time. It isn't a superset of ASCII like UTF-8 is. It does also require you to have 4 bytes per code point, but hey, disk space is cheap.

提交回复
热议问题