Convert Windows-1252 to UTF-8 with JS

被刻印的时光 ゝ 提交于 2019-12-22 11:31:04

问题


I have some strings in dutch language. I know how to encode them using PHP

$str = iconv( "Windows-1252", "UTF-8", $str );

What would be the equivalent in Javascript?


回答1:


Windows-1252 is a single-byte encoding, which is pretty convenient: you can just build a lookup table.

<?php
$s = '';

for ($i = 0; $i < 256; $i++) {
    $converted = iconv('Windows-1252', 'UTF-8', chr($i));

    if ($converted === false) {
        $s .= "\xef\xbf\xbd";  # UTF-8 replacement character
    } else {
        $s .= $converted;
    }
}

echo $s;

Assuming you want a regular JavaScript string as a result (rather than UTF-8) and that the input is a string where each character’s Unicode codepoint actually represents a Windows-1252 one, the resulting table can be read as UTF-8, put in a JavaScript string literal, and voilà:

var WINDOWS_1252 = '\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~€�‚ƒ„…†‡ˆ‰Š‹Œ�Ž��‘’“”•–—˜™š›œ�žŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ';

function fromWindows1252(binaryString) {
    var text = '';

    for (var i = 0; i < binaryString.length; i++) {
        text += WINDOWS_1252.charAt(binaryString.charCodeAt(i));
    }

    return text;
}


来源:https://stackoverflow.com/questions/42414839/convert-windows-1252-to-utf-8-with-js

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!