发表新帖

发表新帖

How does a parser (for example, HTML) work?

后端未结

关注

 5  981

感动是毒 2020-12-12 17:03

For argument\'s sake lets assume a HTML parser.

I\'ve read that it tokenizes everything first, and then parses it.

What does tokenize mean?

5条回答

一个人的身影 (楼主)

2020-12-12 18:05
Don't miss the W3C's notes on parsing HTML5.

For an interesting introduction to scanning/lexing, search the web for Efficient Generation of Table-Driven Scanners. It shows how scanning is ultimately driven by automata theory. A collection of regular expressions is transformed into a single NFA . The NFA is then transformed to a DFA to make state transitions deterministic. The paper then describes a method to transform the DFA into a transition table.

A key point: scanners use regular expression theory but likely don't use existing regular expression libraries. For better performance, state transitions are coded as giant case statements or in transition tables.

Scanners guarantee that correct words(tokens) are used. Parsers guarantee the words are used in the correct combination and order. Scanners use regular expression and automata theory. Parsers use grammar theory, especially context-free grammars.

A couple parsing resources:
- http://www.cs.utk.edu/~eijkhout/594-LaTeX/handouts/parsing/parsing-tutorial.pdf
- http://cryptodrm.engr.uconn.edu/c244lect/L2.pdf
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题