Some questions about tree construction [html spec]

不想你离开。 提交于 2019-12-25 03:59:25

问题


I know that it's not customary to ask a bunch of questions in one, but just each question is interconnected with the previous one. And to ask a bunch of separate questions I think it would be wrong.


So the first thing I'd like to start is:

As each token is emitted from the tokenizer, the user agent must follow the appropriate steps from the following list, known as the tree construction dispatcher:

I correctly understand that the tokenizer will finish its work as soon as it analyzes all the tokens in the source text, that is, as soon as the tokenizer creates a new token it emit it into the tree manager and then continues makes the tokens. Am I right?

By the words will look like this:

lexical analysis
created a token
placed in the tree dispatcher
lexical analysis
created a token
placed in the tree dispatcher
...
lexical analysis
no tokens created

As you can see in the tree construction dispatcher there are a bunch of items, I do not fully understand them.

  • If the stack of open elements is empty
  • If the adjusted current node is an element in the HTML namespace
  • If the adjusted current node is a MathML text integration point and the token is a start tag whose tag name is neither "mglyph" nor "malignmark"
  • If the adjusted current node is a MathML text integration point and the token is a character token
  • If the adjusted current node is a MathML annotation-xml element and the token is a start tag whose tag name is "svg"
  • If the adjusted current node is an HTML integration point and the token is a start tag
  • If the adjusted current node is an HTML integration point and the token is a character token
  • If the token is an end-of-file token

    Process the token according to the rules given in the section corresponding to the current insertion mode in HTML content.

  • Otherwise

    Process the token according to the rules given in the section for parsing tokens in foreign content.

For example, there is the term adjusted current node as I understand it, it can be any tag, for example <section>. This is a node in which there is nothing - <section></section>. I would like to see examples of all sections with the adjusted current node. For example, the If the adjusted current node is a MathML annotation-xml element and the token is a start tag whose tag name is "svg": Here it is said that the node is annotation-xml from the MathML namespace and the start token is the tag whose name is svg. If you represent this item through the text, it should look like this:

<annotation-xml> <!-- this is adjusted current node -->
   <svg> <!-- this is start token named svg -->
   <!-- something content here -->
   </svg>
</annotation-xml>

The last item in the tree construction dispatcher goes into parsing in foreign content. At this point, we also see a bunch of items how to perceive tokens from the tokenizer. If I understand correctly, we can get here by the last item from the tree construction dispatcher (Otherwise), that is, we should have such a structure for getting into the foreign content: <svg></svg> or <math></math>.


Next, point - A start tag.... Interested in the instructions.

  • A start tag whose tag name is one of: "b", "big", "blockquote", "body", "br", "center", "code", "dd", "div", "dl", "dt", "em", "embed", "h1", "h2", "h3", "h4", "h5", "h6", "head", "hr", "i", "img", "li", "listing", "menu", "meta", "nobr", "ol", "p", "pre", "ruby", "s", "small", "span", "strong", "strike", "sub", "sup", "table", "tt", "u", "ul", "var"
  • A start tag whose tag name is "font", if the token has any attributes named "color", "face", or "size"

    Parse error.

    If the parser was originally created for the HTML fragment parsing algorithm, then act as described in the "any other start tag" entry below. (fragment case)

    Otherwise:

    Pop an element from the stack of open elements, and then keep popping more elements from the stack of open elements until the current node is a MathML text integration point, an HTML integration point, or an element in the HTML namespace.

    Then, reprocess the token.

As I understand this is not one instruction and at least there are 3 of them here, I would like to understand this moment, each case of three on examples.

For example: how should the construction look like when Parse error or what the construction looks like when parser was originally created for the HTML fragment parsing algorithm or what the construction looks like with Otherwise?


回答1:


as soon as the tokenizer creates a new token it emit it into the tree manager and then continues makes the tokens. Am I right?

No, it must wait for the tree construction stage to handle the token, because that can affect the next thing that the tokenizer does. As the spec says in 12.2.5 Tokenization:

When a token is emitted, it must immediately be handled by the tree construction stage. The tree construction stage can affect the state of the tokenization stage, and can insert additional characters into the stream.

(I don't think I understand any of your other questions.)



来源:https://stackoverflow.com/questions/51778481/some-questions-about-tree-construction-html-spec

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!