DOMDocument encoding problems / characters transformed

前端 未结 4 1250
粉色の甜心
粉色の甜心 2020-12-16 23:56

I am using DOMDocument to manipulate / modify HTML before it gets output to the page. This is only a html fragment, not a complete page. My initial problem was that all fren

4条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-17 00:31

    loadHtml() doesn't always recognize the correct encoding as specified in the Content-type HTTP-EQUIV meta tag.

    If the DomDocument('1.0', 'UTF-8') and loadHTML('' . $html) hacks don't work as they didn't for me (PHP 5.3.13), try this:

    Add another section immediately after the opening tag with the correct Content-type HTTP-EQUIV meta tag. Then call loadHtml(), then remove the extra tag.

    // Ensure entire page is encoded in UTF-8
    $encoding = mb_detect_encoding($body);
    $body = $encoding ? @iconv($encoding, 'UTF-8', $body) : $body;
    
    // Insert a head and meta tag immediately after the opening  to force UTF-8 encoding
    $insertPoint = false;
    if (preg_match("//is", $body, $matches, PREG_OFFSET_CAPTURE)) {
        $insertPoint = mb_strlen( $matches[0][0] ) + $matches[0][1];
    }
    if ($insertPoint) {
        $body = mb_substr(
            $body,
            0,
            $insertPoint
        ) . "" . mb_substr(
            $body,
            $insertPoint
        );
    }
    $dom = new DOMDocument();
    
    // Suppress warnings for loading non-standard html pages
    libxml_use_internal_errors(true);
    $dom->loadHTML($body);
    libxml_use_internal_errors(false);
    
    // Now remove extra 
    

    See this article: http://devzone.zend.com/1538/php-dom-xml-extension-encoding-processing/

提交回复
热议问题