Remove ÿþ from string

本秂侑毒 提交于 2019-12-28 04:32:30

问题


I'm trying to read ID3 data in bulk. On some of the tracks, ÿþ appears. I can remove the first 2 characters, but that hurts the tracks that don't have it.

This is what I currently have:

$trackartist=str_replace("\0", "", $trackartist1);

Any suggestions would be greatful, thanks!


回答1:


ÿþ is 0xfffe in UTF-8; this is the byte order mark in UTF-16. You can convert your string to UTF-8 with iconv or mb_convert_encoding():

$trackartist1 = iconv('UTF-16LE', 'UTF-8', $trackartist1);

# Same as above, but different extension
$trackartist1 = mb_convert_encoding($trackartist1, 'UTF-16LE', 'UTF-8');

# str_replace() should now work
$trackartist1 = str_replace('ÿþ', '', $trackartist1);

This assumes $trackartist1 is always in UTF-16LE; check the documentation of your ID3 tag library on how to get the encoding of the tags, since this may be different for different files. You usually want to convert everything to UTF-8, since this is what PHP uses by default.




回答2:


I had a similar problem but was not able to force UTF-16LE as the input charset could change. Finally I detect UTF-8 as follows:

if (!preg_match('~~u', $html)) {

For the case that this fails I obtain the correct encoding through the BOM:

function detect_bom_encoding($str) {
    if ($str[0] == chr(0xEF) && $str[1] == chr(0xBB) && $str[2] == chr(0xBF)) {
        return 'UTF-8';
    }
    else if ($str[0] == chr(0x00) && $str[1] == chr(0x00) && $str[2] == chr(0xFE) && $str[3] == chr(0xFF)) {
        return 'UTF-32BE';
    }
    else if ($str[0] == chr(0xFF) && $str[1] == chr(0xFE)) {
        if ($str[2] == chr(0x00) && $str[3] == chr(0x00)) {
            return 'UTF-32LE';
        }
        return 'UTF-16LE';
    }
    else if ($str[0] == chr(0xFE) && $str[1] == chr(0xFF)) {
        return 'UTF-16BE';
    }
}

And now I'm able to use iconv() as you can see in @carpetsmoker answer:

iconv(detect_bom_encoding($html), 'UTF-8', $html);

I did not use mb_convert_encoding() as it did not remove the BOM (and did not convert the linebreaks as iconv() does):




回答3:


Use regex replacement:

$trackartist1 = preg_replace("/\x00?/", "", $trackartist1);

The regex above seeks the first occurrence of "\x00"(hexadecimal zeros), if possible, and replaces it with nothing.



来源:https://stackoverflow.com/questions/26493053/remove-%c3%bf%c3%be-from-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!