How would you create a string of all UTF-8 characters?

前端未结

关注

 5  560

北海茫月 2021-01-02 20:11

There are many ways to represent the +1 million UTF-8 characters. Take the latin capital \"A\" with macron (Ā). This is unicode code point U+0100,

5条回答

刺人心 (楼主)

2021-01-02 20:24

I quickly translated this from C, but it should give you the idea:

function encodeUTF8( $inValue ) {
    $result = "";

    if ( $inValue < 0x00000080 ) {
        $result .= chr( $inValue );
        $extra = 0;
    } else if ( $inValue < 0x00000800 ) {
        $result .= chr( 0x00C0 | ( ( $inValue >> 6 ) & 0x001F ) );
        $extra = 6;
    } else if ( $inValue < 0x00010000 ) {
        $result .= chr( 0x00E0 | ( ( $inValue >> 12 ) & 0x000F ) );
        $extra = 12;
    } else if ( $inValue < 0x00200000 ) {
        $result .= chr( 0x00F0 | ( ( $inValue >> 18 ) & 0x0007 ) );
        $extra = 18;
    } else if ( $inValue < 0x04000000 ) {
        $result .= chr( 0x00F8 | ( ( $inValue >> 24 ) & 0x0003 ) );
        $extra = 24;
    } else if ( $inValue < 0x80000000 ) {
        $result .= chr( 0x00FC | ( ( $inValue >> 30 ) & 0x0001 ) );
        $extra = 30;
    }

    while ( $extra > 0 ) {
        $result .= chr( 0x0080 | ( ( $inValue >> ( $extra -= 6 ) ) & 0x003F ) );
    }

    return $result;
}

The logic is sound but I am not sure about the php so be sure to check it over. I have never tried to use chr like this.

There are a lot of values that you would not want to encode, like 0xD000-0xDFFF, 0xE000-0xF8FF and 0xFFF0-0xFFFF, and there are several other gaps for combining characters and reserved characters.

0 讨论(0)

查看其它5个回答