DOMDocument encoding problems / characters transformed

前端 未结 4 1255
粉色の甜心
粉色の甜心 2020-12-16 23:56

I am using DOMDocument to manipulate / modify HTML before it gets output to the page. This is only a html fragment, not a complete page. My initial problem was that all fren

4条回答
  •  借酒劲吻你
    2020-12-17 00:43

    Don't use utf8_decode. If your text is in UTF-8, pass it as such.

    Unfortunately, DOMDocument defaults to LATIN1 in case of HTML. It seems the behavior is this

    • If fetching a remote document, it should deduce the encoding from the headers
    • If the header wasn't sent or the file is local, look for the correspondent meta-equiv
    • Otherwise, default to LATIN1.

    Example of it working:

    
    
    
    
    
    Sans doute parce qu’il vient d’atteindre une date déterminante
    dans son spectaculaire cheminement
    
    
    HTML;
    
    libxml_use_internal_errors(true);
    $d = new domdocument;
    $d->loadHTML($s);
    
    echo $d->textContent;
    

    And with XML (default is UTF-8):

    Sans doute parce qu’il vient d’atteindre une date déterminante'.
        'dans son spectaculaire cheminement';
    libxml_use_internal_errors(true);
    $d = new domdocument;
    $d->loadXML($s);
    
    echo $d->textContent;
    

提交回复
热议问题