Why must I specify charset attributes for by <script> tags?

旧街凉风 提交于 2019-12-19 11:49:15

问题


I have a bit of an odd situation:

  1. Main HTML page is served in UTF-16 character set (due to some requirements out-of-scope for this question)
  2. HTML page uses <script> tags to load external scripts (i.e. they have src attributes)
  3. Those external scripts are in US-ASCII/UTF-8
  4. The web server is serving the scripts with the content-type "application/javascript" with no character set hints
  5. The scripts have no byte-order-mark (BOM)

When loading the page described above, both Firefox and Chrome (current versions) throw errors saying that the first character of the script files are invalid.

Looking at the "Network" tabs of the respective dev-tools views shows the files are just fine (they render in the previewer just fine).

My conclusion was that the browsers are becoming confused as to what the encoding should be for "the whole page" or some similar foolishness.

So I tried adding a charsrt="UTF-8" attribute to the <script> tags and that seems to solve the problem.

But I really shouldn't have to do that, should I?

First of all, the server is telling the client what the document's type is. It's application/javascript and doesn't specify a character set. (Indeed, the RFC says that charset is only applicable to text/* MIME-types). Okay, I can understand why there might be some ambiguity, there.

But the document-type is javascript, and there are some obvious rules for how to handle a javascript file whose actual charset you don't know. For example, if it's got a BOM, then use it. If there isn't any BOM, it should be really easy to tell UTF-16 from UTF-8. (Note that there doesn't seem to be any problem on these same pages with loading CSS files, which are also in the same situation as the scripts.)

Lastly, the enclosing page shouldn't have to know what the encoding of its dependencies are. In fact, it might be impossible for it to know, and explicitly-specifying the charset then tightly-couples the page to its dependencies and vice-versa.

Is there a way to get the browser to correctly-detect the character set of these dependencies without specifying the charset in the page itself?


回答1:


Without a BOM in the file, or an explicit charset in the <script> or Content-Type for the file, the encoding of the file is ambiguous. The browser might assume UTF-8 (and should, per RFC 4329), but if the script contains any non-ASCII characters that are not actually encoded in UTF-8, the file won't process properly.

However, HTML 5 Section 4.11 dictates that a <script>'s fallback encoding is the document's encoding if the <script> does not have a charset attribute. The fallback takes effect if there is no BOM or charset to specify the file's actual encoding.

So, either make sure your HTML and JS files are always using the same encoding, or else you have to be explicit about the JS file's charset, one way or the other.



来源:https://stackoverflow.com/questions/52102142/why-must-i-specify-charset-attributes-for-by-script-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!