How to convert large UTF-8 strings into ASCII?

前端 未结 9 1699
盖世英雄少女心
盖世英雄少女心 2020-12-18 08:29

I need to convert large UTF-8 strings into ASCII. It should be reversible, and ideally a quick/lightweight algorithm.

How can I do this? I need the source

相关标签:
9条回答
  • 2020-12-18 08:50

    Any UTF-8 string that is reversibly convertible to ASCII is already ASCII.

    UTF-8 can represent any unicode character - ASCII cannot.

    0 讨论(0)
  • 2020-12-18 08:57

    If the string is encoded as UTF-8, it's not a string any more. It's binary data, and if you want to represent the binary data as ASCII, you have to format it into a string that can be represented using the limited ASCII character set.

    One way is to use base-64 encoding (example in C#):

    string original = "asdf";
    // encode the string into UTF-8 data:
    byte[] encodedUtf8 = Encoding.UTF8.GetBytes(original);
    // format the data into base-64:
    string base64 = Convert.ToBase64String(encodedUtf8);
    

    If you want the string encoded as ASCII data:

    // encode the base-64 string into ASCII data:
    byte[] encodedAscii = Encoding.ASCII.GetBytes(base64);
    
    0 讨论(0)
  • 2020-12-18 08:57

    It is impossible to convert an UTF-8 string into ASCII but it is possible to encode Unicode as an ASCII compatible string.

    Probably you want to use Punycode - this is already a standard Unicode encoding that encodes all Unicode characters into ASCII. For JavaScript code check this question

    Please edit you question title and description in order to prevent others from down-voting it - do not use term conversion, use encoding.

    0 讨论(0)
  • 2020-12-18 08:59

    Here is a function to convert UTF8 accents to ASCII Accents (àéèî etc) If there is an accent in the string it's converted to %239 for exemple Then on the other side, I parse the string and I know when there is an accent and what is the ASCII char.

    I used it in a javascript software to send data to a microcontroller that works in ASCII.

    convertUtf8ToAscii = function (str) {
        var asciiStr = "";
        var refTable = { // Reference table Unicode vs ASCII
            199: 128, 252: 129, 233: 130, 226: 131, 228: 132, 224: 133, 231: 135, 234: 136, 235: 137, 232: 138,
            239: 139, 238: 140, 236: 141, 196: 142, 201: 144, 244: 147, 246: 148, 242: 149, 251: 150, 249: 151
        };
        for(var i = 0; i < str.length; i++){
            var ascii = refTable[str.charCodeAt(i)];
            if (ascii != undefined)
                asciiStr += "%" +ascii;
            else
                asciiStr += str[i];
        }
        return asciiStr;
    }
    
    0 讨论(0)
  • 2020-12-18 09:06

    Do you want to strip all non ascii chars (slash replace them with '?', etc) or to store Unicode code points in a non unicode system?

    First can be done in a loop checking for values > 128 and replacing them.

    If you don't want to use "any platform/framework/library" then you will need to write your own encoder. Otherwise I'd just use JQuery's .html();

    0 讨论(0)
  • 2020-12-18 09:07

    An implementation of the quote() function might do what you want. My version can be found here

    You can use eval() to reverse the encoding:

    var foo = 'Hägar';
    var quotedFoo = quote(foo);
    var unquotedFoo = eval(quotedFoo);
    alert(foo === unquotedFoo);
    
    0 讨论(0)
提交回复
热议问题