How to use Finite Automaton to implement a scanner

时光怂恿深爱的人放手 提交于 2019-12-02 05:39:17

The most common rule is "maximal munch", which always selects the longest possible token.

In general, the algorithm to scan a single token using a DFA is as follows: (To tokenise an input, this algorithm is repeated until the end of the input is reached, with each scan starting at the input cursor left by the previous scan.)

Set the DFA state to the start state. Then, for each input character in sequence:

  • if the DFA has a transition on the character, move to the iindicated state. If that state is an accepting state, record the current input cursor and the state number.

  • otherwise, rewind the input to the last recorded accept position and return the recorded state number.

Here, the accepting state number is used to indicate which token was encountered. In practice, it is common to associate each accepting state with a token code, because some token types will have more than one accepting state.

It is not always necessary to use the backtracking algorithm described above. In some cases, analysis of the DFA will reveal that the last accepting state was always at the immediately preceding input position. However, many languages require backtracking.

For example, a language in which . and ... are both tokens but not .. (such as C) must backtrack on input ..1, which should be tokenised as ., .1. In the tokenisation of this input, the first scan will accept the first ., continue with the second . (which is not an accepting state) and then fail to find a transition on the input 1. It will then report that it found a . (the recorded accepted token) and reset the input cursor to the second character position, so that the next scan will see the token .1.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!