Why do HTML Entities get garbled in View Source?

时光毁灭记忆、已成空白 提交于 2019-12-23 04:00:19

问题


I've seen this behavior across several different browsers over the years (Chrome, Firefox, and Opera, at least), but most recently it happens only in Opera and Chrome - I think Firefox fixed it at some point. If I have a page which pushes a fairly sizeable chunk of data (several thousand lines of HTML) to the browser, if I use any HTML Entities in the data, they come through malformed when you view the source code.

For example, I put a "lower right pencil" entity ( ✎ - or ✎) throughout the contents of a page in order to label "Edit" links. However, when I load the same page in any browser and click "View Source", I see a random code that often does not match what is actually hard coded into the page HTML. Some examples include:

&x#x2#x270E;, &#x#x270E;, &#x270#x270E;

Examining a Fiddler capture of the actual source code being sent to the browser shows that the browser indeed receives the CORRECT codes. Something seems to go awry as soon as the browser tries to display it in a view-source tab.

It happens with other codes too,   becomes &nbnbsp; or &nnbsp; etc. Mysteriously, these randomize with each refresh. Once in a while they come through correct, though most of the time they get garbled. The codes appear to render correctly on the front-end, is this just a bug in every major browser, or should I be concerned about data loss when pushing somewhat large data sets over HTTP?

Past Tests

I ran two tests to confirm this:

(1) Spammed a single character into a valid HTML5 page's contents hosted on a public facing AWS LAMP server. Viewed the contents in Opera and viewed source. Most were okay, but about half way down it starts to trip up, and continues sporadically throughout:

&#x27#x270E;

(2) Spammed a single character into a valid HTML5 page's contents hosted on an intranet Windows server and served over a NetExtender VPN. Same result as the first test.

&#x270#x270E;✎

Steps to Reproduce:

I have tested this on many different systems (Linux - like Ubuntu, Windows 7 and Windows 10 so far) on several different networks. However, I would appreciate if others could confirm this.

  1. Create a valid HTML page and paste a single HTML Entity (either decimal or hexidecimal representation) between the body tags.
  2. Copy and paste the character to fill up several hundred lines of content (less may be required, but more will be most likely to produce the same issue). For example:         ... etc.
  3. Save the page on your web server.
  4. Load the page in a new Opera window.
  5. Right click anywhere in the page and click "Page source"
  6. Copy the source code and either manually examine it or just paste it into the W3 Validator at https://validator.w3.org - it will help to point out the incorrectly formatted HTML Entities.

Opera 49.0 Illustration

See below how the Code Inspector shows the correct HTML Entity code. However, when you view Page Source for the same section, the code gets malformed.

来源:https://stackoverflow.com/questions/47819832/why-do-html-entities-get-garbled-in-view-source

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!