Are “>>”s in type parameters tokenized using a special rule?

浪子不回头ぞ 提交于 2019-12-08 15:11:22

问题


I'm confused by the Java spec about how this code should be tokenized:

ArrayList<ArrayList<Integer>> i;

The spec says:

The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would.

As I understand it, applying the "longest match" rule would result in the tokens:

  • ArrayList
  • <
  • ArrayList
  • <
  • Integer
  • >>
  • i
  • ;

which would not parse. But of course this code is parsed just fine.

What is the correct specification for this case?

Does it mean that a correct lexer must be context-free? It doesn't seem possible with a regular lexer.


回答1:


Based on reading the code linked by @sm4, it looks like the strategy is:

  • tokenize the input normally. So A<B<C>> i; would be tokenized as A, <, B, <, C, >>, i, ; -- 8 tokens, not 9.

  • during hierarchical parsing, when working on parsing generics and a > is needed, if the next token starts with > -- >>, >>>, >=, >>=, or >>>= -- just knock the > off and push a shortened token back onto the token stream. Example: when the parser gets to >>, i, ; while working on the typeArguments rule, it successfully parses typeArguments, and the remaining token stream is now the slightly different >, i, ;, since the first > of >> was pulled off to match typeArguments.

So although tokenization does happen normally, some re-tokenization occurs in the hierarchical parsing phase, if necessary.



来源:https://stackoverflow.com/questions/16803185/are-s-in-type-parameters-tokenized-using-a-special-rule

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!