THE PROBLEM: I need a XML file \"full encoded\" by UTF8; that is, with no entity representing symbols, all symbols enconded by UTF8, except the only 3 ones that are
This question is creating, time-by-time, a "false answer" (see answers). This is perhaps because people not pay attention, and because there are NO ANSWER: there are a lack of PHP build-in solution.
... So, lets repeat my workaround (that is NOT an answer!) to not create more confusion:
Pay attention:
xml_entity_decode() below is the best (over any other) workaround. function xml_entity_decode($s) {
// illustrating how a (hypothetical) PHP-build-in-function MUST work
static $XENTITIES = array('&','>','<');
static $XSAFENTITIES = array('#_x_amp#;','#_x_gt#;','#_x_lt#;');
$s = str_replace($XENTITIES,$XSAFENTITIES,$s);
$s = html_entity_decode($s, ENT_HTML5|ENT_NOQUOTES, 'UTF-8'); // PHP 5.3+
$s = str_replace($XSAFENTITIES,$XENTITIES,$s);
return $s;
}
To test and to demonstrate that you have a better solution, please test first with this simple benckmark:
$countBchMk_MAX=1000;
$xml = file_get_contents('sample1.xml'); // BIG and complex XML string
$start_time = microtime(TRUE);
for($countBchMk=0; $countBchMk<$countBchMk_MAX; $countBchMk++){
$A = xml_entity_decode($xml); // 0.0002
/* 0.0014
$doc = new DOMDocument;
$doc->loadXML($xml, LIBXML_DTDLOAD | LIBXML_NOENT);
$doc->encoding = 'UTF-8';
$A = $doc->saveXML();
*/
}
$end_time = microtime(TRUE);
echo "\nEND $countBchMk_MAX BENCKMARKs WITH ",
($end_time - $start_time)/$countBchMk_MAX,
" seconds
";