This is my code:
$oDom = new DOMDocument();
$oDom->loadHTML(\"èàéìòù\");
echo $oDom->saveHTML();
This is the output:
This way:
/**
* @param string $text
* @return DOMDocument
*/
private function buildDocument($text)
{
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML('<meta http-equiv="Content-Type" content="text/html; charset=utf-8">' . $text);
libxml_use_internal_errors(false);
return $dom;
}
Solution:
$oDom = new DOMDocument();
$oDom->encoding = 'utf-8';
$oDom->loadHTML( utf8_decode( $sString ) ); // important!
$sHtml = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$sHtml .= $oDom->saveHTML( $oDom->documentElement ); // important!
The saveHTML()
method works differently specifying a node.
You can use the main node ($oDom->documentElement
) adding the desired !DOCTYPE
manually.
Another important thing is utf8_decode()
.
All the attributes and the other methods of the DOMDocument
class, in my case, don't produce the desired result.
Looks like you just need to set substituteEntities when you create the DOMDocument object.
The issue appears to be known, according to the user comments on the manual page at php.net. Solutions suggested there include putting
<meta http-equiv="content-type" content="text/html; charset=utf-8">
in the document before you put any strings with non-ASCII chars in.
Another hack suggests putting
<?xml encoding="UTF-8">
as the first text in the document and then removing it at the end.
Nasty stuff. Smells like a bug to me.
I don't know why the marked answer didn't work for my problem. But this one did.
ref: https://www.php.net/manual/en/class.domdocument.php
<?php
// checks if the content we're receiving isn't empty, to avoid the warning
if ( empty( $content ) ) {
return false;
}
// converts all special characters to utf-8
$content = mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8');
// creating new document
$doc = new DOMDocument('1.0', 'utf-8');
//turning off some errors
libxml_use_internal_errors(true);
// it loads the content without adding enclosing html/body tags and also the doctype declaration
$doc->LoadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
// do whatever you want to do with this code now
?>
$dom = new DomDocument();
$str = htmlentities($str);
$dom->loadHTML(utf8_decode($str));
$dom->encoding = 'utf-8';
.
.
.
$str = $dom->saveHTML();
$str = html_entity_decode($str);
The above code worked for me.