I want to delete the BOM from my imported file, but it just doesn\'t seem to work.
I tried to preg_replace(\'/[\\x00-\\x1F\\x80-\\xFF]/\', \'\', $file);
Isn't the BOM there to give you a clue on how to reencode the input to something your script/app/database needs? Just deleting isn't gonna help.
This is how I force a string (drawn from a file with file_get_contents()
) to be encoded in UTF-8 and get rid of the BOM as well:
switch (true) {
case (substr($string,0,3) == "\xef\xbb\xbf") :
$string = substr($string, 3);
break;
case (substr($string,0,2) == "\xfe\xff") :
$string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16BE");
break;
case (substr($string,0,2) == "\xff\xfe") :
$string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16LE");
break;
case (substr($string,0,4) == "\x00\x00\xfe\xff") :
$string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32BE");
break;
case (substr($string,0,4) == "\xff\xfe\x00\x00") :
$string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32LE");
break;
default:
$string = iconv(mb_detect_encoding($string, mb_detect_order(), true), "UTF-8", $string);
};