Can R read html-encoded emoji characters?

后端 未结 4 1298
旧巷少年郎
旧巷少年郎 2021-01-13 00:58

Question

My question, explained below, is:

How can R be used to read a string that includes HTML emoji codes like ��

4条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-01-13 01:18

    JavaScript Solution

    I had this exact same problem, but needed the solution in JavaScript, not R. Using rensa's comment above (hugely helpful!), I created the following code to solve this issue, and I just wanted to share it in case anyone else happens across this thread as I did, but needed it in JavaScript.

    str.replace(/(&#\d+;){2}/g, function(match) {
        match = match.replace(/&#/g,'').split(';');
        var binFirst = (parseInt('0x' + parseInt(match[0]).toString(16)) - 0xd800).toString(2);
        var binSecond = (parseInt('0x' + parseInt(match[1]).toString(16)) - 0xdc00).toString(2);
        binFirst = '0000000000'.substr(binFirst.length) + binFirst;
        binSecond = '0000000000'.substr(binSecond.length) + binSecond;
        return '&#x' + (('0x' + (parseInt(binFirst + binSecond, 2).toString(16))) - (-0x10000)).toString(16) + ';';
    });
    

    And, here's a full snippet of it working if you'd like to run it:

    var str = '������������'
    
    str = str.replace(/(&#\d+;){2}/g, function(match) {
    	match = match.replace(/&#/g,'').split(';');
    	var binFirst = (parseInt('0x' + parseInt(match[0]).toString(16)) - 0xd800).toString(2);
    	var binSecond = (parseInt('0x' + parseInt(match[1]).toString(16)) - 0xdc00).toString(2);
    	binFirst = '0000000000'.substr(binFirst.length) + binFirst;
    	binSecond = '0000000000'.substr(binSecond.length) + binSecond;
    	return '&#x' + (('0x' + (parseInt(binFirst + binSecond, 2).toString(16))) - (-0x10000)).toString(16) + ';';
    });
    
    document.getElementById('result').innerHTML = str;
    
    //  ������������
    //  is turned into
    //  😊😘😀😆😂😁
    //  which is rendered by the browser as the emojis
    Original:
    ������������

    Result:

    My SMS XML Parser application is working great now, but it stalls out on large XML files so, I'm thinking about rewriting it in PHP. If/when I do, I'll post that code as well.

提交回复
热议问题