Convert International String to \u Codes in java

后端 未结 12 2030
离开以前
离开以前 2020-11-29 02:53

How can I convert an international (e.g. Russian) String to \\u numbers (unicode numbers)
e.g. \\u041e\\u041a for OK ?

12条回答
  •  执笔经年
    2020-11-29 03:08

    There are three parts to the answer

    1. Get the Unicode for each character
    2. Determine if it is in the Cyrillic Page
    3. Convert to Hexadecimal.

    To get each character you can iterate through the String using the charAt() or toCharArray() methods.

    for( char c : s.toCharArray() )
    

    The value of the char is the Unicode value.

    The Cyrillic Unicode characters are any character in the following ranges:

    Cyrillic:            U+0400–U+04FF ( 1024 -  1279)
    Cyrillic Supplement: U+0500–U+052F ( 1280 -  1327)
    Cyrillic Extended-A: U+2DE0–U+2DFF (11744 - 11775)
    Cyrillic Extended-B: U+A640–U+A69F (42560 - 42655)
    

    If it is in this range it is Cyrillic. Just perform an if check. If it is in the range use Integer.toHexString() and prepend the "\\u". Put together it should look something like this:

    final int[][] ranges = new int[][]{ 
            {  1024,  1279 }, 
            {  1280,  1327 }, 
            { 11744, 11775 }, 
            { 42560, 42655 },
        };
    StringBuilder b = new StringBuilder();
    
    for( char c : s.toCharArray() ){
        int[] insideRange = null;
        for( int[] range : ranges ){
            if( range[0] <= c && c <= range[1] ){
                insideRange = range;
                break;
            }
        }
    
        if( insideRange != null ){
            b.append( "\\u" ).append( Integer.toHexString(c) );
        }else{
            b.append( c );
        }
    }
    
    return b.toString();
    

    Edit: probably should make the check c < 128 and reverse the if and the else bodies; you probably should escape everything that isn't ASCII. I was probably too literal in my reading of your question.

提交回复
热议问题