Why does DOMDocument::saveHTML()'s behavior differ in encoding UTF-8 as entities in style & script elements?

不打扰是莪最后的温柔 提交于 2020-02-05 07:15:14

问题


Given a DOMDocument constructed with a stylesheet that contains an emoji character like so:

$dom = new DOMDocument();
$dom->loadHTML( "<!DOCTYPE html><html><head><meta charset=utf-8><style>span::before{ content: \"⚡️\"; }</style></head><body><span></span></body></html>" );

I've found some strange behavior when serializing the DOM back out to HTML.

If I do $dom->saveHTML( $dom->documentElement ) then I get (as desired):

<html><head><meta charset="utf-8">
<style>span::before{ content: "⚡️"; }</style>
</head><body><span></span></body></html>

However, if I instead do $dom->saveHTML() to save the entire document I get (erroneously):

<html><head><meta charset="utf-8">
<style>span::before{ content: "&#9889;&#65039;"; }</style>
</head><body><span></span></body></html>

Notice how the emoji “⚡️” is encoded as the HTML entities &#9889;&#65039; inside of the stylesheet, and browsers do not like this and it is treated as a literal string since CSS escape \26A1 should be used in instead.

I tried setting $dom->substituteEntities = false but without any effect.

The same HTML entity conversion is also happening inside of script tags, which causes similar problems in browsers.

Test via online PHP shell: https://3v4l.org/jMfDd

来源:https://stackoverflow.com/questions/51660286/why-does-domdocumentsavehtmls-behavior-differ-in-encoding-utf-8-as-entities

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!