Why Does DOM Change Encoding?

后端 未结 4 1342
你的背包
你的背包 2020-12-08 03:16
$string = file_get_contents(\'http://example.com\');

if (\'UTF-8\' === mb_detect_encoding($string)) {
    $dom = new DOMDocument();
    // hack to preserve UTF-8 ch         


        
相关标签:
4条回答
  • 2020-12-08 03:29

    In case it is definitely the DOM screwing up the encoding, this trick did it for me a while back the other way round (accepting ISO-8859-1 data). DOMDocument should be UTF-8 by default in any case but you can still try:

        $dom = new DOMDocument('1.0', 'utf-8');
    
    0 讨论(0)
  • 2020-12-08 03:39

    I had to add a utf8 header to get the correct view:

    header('Content-Type: text/html; charset=utf-8');
    
    0 讨论(0)
  • 2020-12-08 03:46

    At the top of the script where your php code lies(the code you posted here), make sure you send a utf-8 header. I bet your encoding is a some variant of latin1 right now. Yes, I know the remote webpage is utf8, but this php script isn't.

    0 讨论(0)
  • 2020-12-08 03:49

    I had similar problems recently, and eventually found this workaround - convert all the non-ascii characters to html entities before loading the html

    $string = mb_convert_encoding($string, 'HTML-ENTITIES', "UTF-8");
    $dom->loadHTML($string);
    
    0 讨论(0)
提交回复
热议问题