DOMDocument encoding problems / characters transformed

前端 未结 4 1253
粉色の甜心
粉色の甜心 2020-12-16 23:56

I am using DOMDocument to manipulate / modify HTML before it gets output to the page. This is only a html fragment, not a complete page. My initial problem was that all fren

4条回答
  •  伪装坚强ぢ
    2020-12-17 00:54

    As others have pointed out, DOMDocument and LoadHTML will default to LATIN1 encoding with HTML fragments. It will also wrap your HTML with something like this:

    
    YOUR HTML
    

    So also as others have pointed out, you can fix the encoding by inserting a HEAD element into your HTML with a META element that contains the correct encoding.

    However, if you're working with an HTML fragment, you probably don't want the wrapping to happen and you also don't want to keep that HEAD element you inserted.

    The following code will insert the HEAD element, and then after processing, using regex will remove all the wrapping elements:

    Hello World

    Goodbye World

    '; $head = ''; libxml_use_internal_errors(true); $dom = new DOMDocument('1.0', 'utf-8'); $dom->loadHTML($head . $html); $xpath = new DOMXPath($dom); // Loop through all article.grid-item elements and add the "invisible" class to them $nodes = $xpath->query("//article[contains(concat(' ', normalize-space(@class), ' '), ' grid-item ')]"); foreach($nodes as $node) { $class = $node->getAttribute('class'); $class .= ' invisible'; $node->setAttribute('class', $class); } $content = preg_replace('/<\/?(!doctype|html|head|meta|body)[^>]*>/im', '', $dom->saveHTML()); libxml_use_internal_errors(false); echo $content; ?>

提交回复
热议问题