PHP htmlentities and saving the data in xml format

╄→гoц情女王★ 提交于 2020-01-15 23:37:32

问题


Im trying to save some data into a xml file using the following PHP script:

<?php

$string = '<a href="google.com/maps">Go to google maps</a> and some special characters ë è & ä etc.';

$string = htmlentities($string, ENT_QUOTES, 'UTF-8');

$doc = new DOMDocument('1.0', 'UTF-8');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;

$root = $doc->createElement('top');
$root = $doc->appendChild($root);

$title = $doc->createElement('title');
$title = $root->appendChild($title);

$id = $doc->createAttribute('id');
$id->value = '1';
$text = $title->appendChild($id);

$text = $doc->createTextNode($string);
$text = $title->appendChild($text);

$doc->save('data.xml');

echo 'data saved!';

?>

I'm using htmlentities to translate all of the string into an html format, if I leave this out the special characters won't be translated to html format. this is the output:

<?xml version="1.0" encoding="UTF-8"?>
<top>
  <title id="1">&amp;lt;a href=&amp;quot;google.com/maps&amp;quot;&amp;gt;Go to google maps&amp;lt;/a&amp;gt; and some special characters &amp;euml; &amp;egrave; &amp;amp; &amp;auml; etc.</title>
</top>

The ampersand of the html tags get a double html code: &amp;lt; and an ampersand becomes: &amp;amp;

Is this normal behavior? Or how can I prevent this from happening? Looks like a double encoding.


回答1:


Try to remove the line:

$string = htmlentities($string, ENT_QUOTES, 'UTF-8');

Because the text passed to createTextNode() is escaped anyway.

Update: If you want the utf-8 characters to be escaped. You could leave that line and try to add the $string directly in createElement().

For example:

$title = $doc->createElement('title', $string);
$title = $root->appendChild($title);

In PHP documentation it says that $string will not be escaped. I haven't tried it, but it should work.




回答2:


It is the htmlentities that turns a & into &amp; When working with xml data you should not use htmlentities, as the DOMDocument will handle a & and not &amp;.

As of php 5.3 the default encoding is UTF-8, so there is no need to convert to UTF-8.




回答3:


This line:

$string = htmlentities($string, ENT_QUOTES, 'UTF-8');

… encodes a string as HTML.

This line:

$text = $doc->createTextNode($string);

… encodes your string of HTML as XML.

This gives you an XML representation of an HTML string. When the XML is parsed you get the HTML back.

how can I prevent this from happening?

If your goal is to store some text in an XML document. Remove the line that encodes it as HTML.

Looks like a double encoding.

Pretty much. It is encoded twice, it just uses different (albeit very similar) encoding methods for each of the two passes.



来源:https://stackoverflow.com/questions/12330571/php-htmlentities-and-saving-the-data-in-xml-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!