How does chrome establish the right character-encoding?

六眼飞鱼酱① 提交于 2019-12-14 03:49:59

问题


I've been working with a lot of charsets lately and I discovered a lot of issues when trying to establish the proper charset for a random web page. The charset can be set in the headers of the html document, or inside the <head> section, multiple times or sometimes the declaration is omitted. Despite these issues chrome dose a great job at setting the best charset every time.

I've tried searching the sources but didn't manage to find anything as I don't know where to look.

So my question is where could I find the algorithm?

Thanks


update:

problematic example:

HTTP header of a document (based on server configurations):
Content-type: text/html; charset=utf-8
and the document looks like:

<?xml version="1.0" encoding="ISO-8859-1"?>
<html>
<head>
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html;charset=ISO-8859-1" />
</head>
<body>...</body>
</html>

Which encoding would be used to render the text?


回答1:


Chrome use https://github.com/google/compact_enc_det

If you want to read the actual code that calls that project, the function is DetectTextEncoding in the file third_party/blink/renderer/platform/text/text_encoding_detector.cc




回答2:


Headers charset will always overrule meta charset.



来源:https://stackoverflow.com/questions/13155467/how-does-chrome-establish-the-right-character-encoding

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!