I\'m building an XML file from scratch and need to know if htmlentities() converts every character that could potentially break an XML file (and possibly UTF-8 data)?
Dom::createTextNode() will automatically escape your content.
Example:
$dom = new DOMDocument;
$element = $dom->createElement('Element');
$element->appendChild(
$dom->createTextNode('I am text with Ünicödé & HTML €ntities ©'));
$dom->appendChild($element);
echo $dom->saveXml();
Output:
I am text with Ünicödé & HTML €ntities ©
When you set the internal encoding to utf-8, e.g.
$dom->encoding = 'utf-8';
you'll still get
I am text with Ünicödé & HTML €ntities ©
Note that the above is not the same as setting the second argument $value
in Dom::createElement(). The method will only make sure your element names are valid. See the Notes on the manual page, e.g.
$dom = new DOMDocument;
$element = $dom->createElement('Element', 'I am text with Ünicödé & HTML €ntities ©');
$dom->appendChild($element);
$dom->encoding = 'utf-8';
echo $dom->saveXml();
will result in a Warning
Warning: DOMDocument::createElement(): unterminated entity reference HTML €ntities ©
and the following output:
I am text with Ünicödé