How to convert large UTF-8 strings into ASCII?

前端 未结 9 1700
盖世英雄少女心
盖世英雄少女心 2020-12-18 08:29

I need to convert large UTF-8 strings into ASCII. It should be reversible, and ideally a quick/lightweight algorithm.

How can I do this? I need the source

相关标签:
9条回答
  • 2020-12-18 09:11

    You could use an ASCII-only version of Douglas Crockford's json2.js quote function. Which would look like this:

        var escapable = /[\\\"\x00-\x1f\x7f-\uffff]/g,
            meta = {    // table of character substitutions
                '\b': '\\b',
                '\t': '\\t',
                '\n': '\\n',
                '\f': '\\f',
                '\r': '\\r',
                '"' : '\\"',
                '\\': '\\\\'
            };
    
        function quote(string) {
    
    // If the string contains no control characters, no quote characters, and no
    // backslash characters, then we can safely slap some quotes around it.
    // Otherwise we must also replace the offending characters with safe escape
    // sequences.
    
            escapable.lastIndex = 0;
            return escapable.test(string) ?
                '"' + string.replace(escapable, function (a) {
                    var c = meta[a];
                    return typeof c === 'string' ? c :
                        '\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
                }) + '"' :
                '"' + string + '"';
        }
    

    This will produce a valid ASCII-only, javascript-quoted of the input string

    e.g. quote("Doppelgänger!") will be "Doppelg\u00e4nger!"

    To revert the encoding you can just eval the result

    var encoded = quote("Doppelgänger!");
    var back = JSON.parse(encoded); // eval(encoded);
    
    0 讨论(0)
  • 2020-12-18 09:12

    As others have said, you can't convert UTF-8 text/plain into ASCII text/plain without dropping data.

    You could convert UTF-8 text/plain into ASCII someother/format. For instance, HTML lets any character in UTF-8 be representing in an ASCII data file using character references.

    If we continue with that example, in JavaScript, charCodeAt could help with converting a string to a representation of it using HTML character references.

    Another approach is taken by URLs, and implemented in JS as encodeURIComponent.

    0 讨论(0)
  • 2020-12-18 09:14

    Your requirement is pretty strange.

    Converting UTF-8 into ASCII would loose all information about Unicode codepoints > 127 (i.e. everything that's not in ASCII).

    You could, however try to encode your Unicode data (no matter what source encoding) in an ASCII-compatible encoding, such as UTF-7. This would mean that the data that is produced could legally be interpreted as ASCII, but it is really UTF-7.

    0 讨论(0)
提交回复
热议问题