illegal self closing node notation for empty nodes - outputting XHTML with PHP DOMDocument

南笙酒味 提交于 2019-12-14 00:46:11

问题


I am processing an XML compliant input of XHTML using XPATH in PHP like this:

$xml=new DOMDocument();
$xml->loadXML(utf8_encode($temp));
[...]
$temp=utf8_decode($xml->saveXML());

The problem that arises is that nodes that may not be self closing according to the HTML5 specs, e.g.

<textarea id="something"></textarea>

or a div to leverage by JS

<div id="someDiv" class="whaever"></div>

come back out as

<textarea id="something" />

and

<div id="someDiv" class="whaever" />

I currently address this by using str_replace, but that's nonsese as I need to match individual cases. How can I solve this?

At the same time XPATH insists on putting out

xmlns:default="http://www.w3.org/1999/xhtml

and on individual nodes freshly created, it puts stuff like <default:p>. How do I stop that without resorting to stupid search and replace like this:

$temp=str_replace(' xmlns:default="http://www.w3.org/1999/xhtml" '," ",$temp);
$temp=str_replace(' xmlns:default="http://www.w3.org/1999/xhtml"'," ",$temp);
$temp=str_replace('<default:',"<",$temp);
$temp=str_replace('</default:',"</",$temp);

?

EDIT: I'm really getting trouble with the stupid search and replace and I do not intend to attack the output XHTML with RegExp. Consider this example:

<div id="videoPlayer0" class="videoPlayerPlacement" data-xml="video/cp_IV_a_1.xml"/>

Obviously self-closing divs are illegal (at least in one context where I cannot output as mime application/xhtml+xml but am forced to use mime text/html) and in all other cases they sure don't validate.


回答1:


Sorry for the late reply, but you know... it was Christmas. :D

function export_html(DOMDocument $dom)
{
        $voids = ['area',
                  'base',
                  'br',
                  'col',
                  'colgroup',
                  'command',
                  'embed',
                  'hr',
                  'img',
                  'input',
                  'keygen',
                  'link',
                  'meta',
                  'param',
                  'source',
                  'track',
                  'wbr'];

        // Every empty node. There is no reason to match nodes with content inside.
        $query = '//*[not(node())]';
        $nodes = (new DOMXPath($dom))->query($query);

        foreach ($nodes as $n) {
                if (! in_array($n->nodeName, $voids)) {
                        // If it is not a void/empty tag,
                        // we need to leave the tag open.
                        $n->appendChild(new DOMComment('NOT_VOID'));
                }
        }

        // Let's remove the placeholder.
        return str_replace('<!--NOT_VOID-->', '', $dom->saveXML());
}

In your example

$dom = new DOMDocument();
$dom->loadXML(<<<XML
<html>
        <textarea id="something"></textarea>
        <div id="someDiv" class="whaever"></div>
</html>
XML
);

echo export_html($dom); will produce

<?xml version="1.0"?>
<html>
    <textarea id="something"></textarea>
    <div id="someDiv" class="whaever"></div>
</html>

Merry Christmas! ^_^




回答2:


Should you not know that HTML5 can be written and served as XML look at this: "It seems not very clear for many people. So let’s set the record straight. HTML 5 can be written in html and XML."

Next to actually serve any PHP example as XML set the according header:

header("content-type: application/xhtml+xml; charset=UTF-8");

In actual XML documents you cannot have any self closing tags written without a closing slash. No <br> instead of </br> etc. With that prelude let's go on...

We found that using the LIBXML_NOEMPTYTAG option in

$xml=new DOMDocument();
$xml->loadXML(utf8_encode($temp));
  // do stuff with the DOM
$temp=utf8_decode($xml->saveXML(NULL, LIBXML_NOEMPTYTAG));

does not "solve" the problem but reverses it. The HTML5 spec names a number of "void elements". they are: area, base, br, col, embed, hr, img, input, keygen, link, meta, param, source, track, wbr and to quote the spec on them: "Void elements can't have any contents (since there's no end tag, no content can be put between the start tag and the end tag)."

Because of their defined lack of content the void elements can be used to get this right by a simple RegExp (in lack of an actual solution):

$temp = preg_replace('#></(area|base|br|col|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)>#si', '/>', $temp);

After which we can go on with the other stupid fixes I had in the question:

$temp=str_replace(' xmlns:default="http://www.w3.org/1999/xhtml"','',$temp);
$temp=str_replace('<default:',"<",$temp);
$temp=str_replace('</default:',"</",$temp);


来源:https://stackoverflow.com/questions/34034229/illegal-self-closing-node-notation-for-empty-nodes-outputting-xhtml-with-php-d

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!