How would you create a string of all UTF-8 characters?

前端 未结 5 572
北海茫月
北海茫月 2021-01-02 20:11

There are many ways to represent the +1 million UTF-8 characters. Take the latin capital \"A\" with macron (Ā). This is unicode code point U+0100,

5条回答
  •  渐次进展
    2021-01-02 20:31

    I'm not sure you can do this programmatically, mostly because there is a difference between a Unicode code point and a character. See http://www.unicode.org/standard/where for a few examples of characters that are represented by a combination of code points.

    Some code points make no sense on their own and can only be used in conjunction with another character (think accents). See http://www.unicode.org/charts/charindex.html for a list of code points, and look at the section with all the "combining" code points.

    Also, for use in testing applications, you'd need something else besides a list of possible UTF-8 code points, namely several invalid/malformed UTF-8 sequences that your app needs to be able to recover gracefully from.

    For this, take a look at Markus Kuhn's Unicode stress test.

提交回复
热议问题