antlr 4: Should all of these tokens be showing up in the AST?

大城市里の小女人 提交于 2019-12-11 08:58:46

问题


My ultimate goal is to parse a structured file as a tree of in-memory objects that I can then manipulate. The file format that I'm using is fairly sophisticated with about 200 keywords/tags, and this seemed like a good reason to learn about parser/lexer frameworks.

Unfortunately, there are so many concepts (and hundreds of tutorials and guides) that the learning process so far feels like trying to drink from a fire hose. So I'm taking some very meager baby steps, starting with this example.

I modified the grammar to create the following test, Nano.g4:

grammar Nano;

r  : root ;
root : START ROOT ID END ROOT;
START : 'StartBlock' ;
END : 'EndBlock' ;
ROOT : 'RootItem' ;
ID : [a-z]+ ;             // match lower-case identifiers
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

Next, I created a simple input file, nano.txt:

StartBlock RootItem
   foo
EndBlock RootItem

I then loaded the code using the following commands:

del *.class
del *.java
java org.antlr.v4.Tool Nano.g4
javac nano*.java
java org.antlr.v4.runtime.misc.TestRig Nano r -gui < nano.txt

That gives me this result:

The tree above is my first conceptual hangup about what to expect from a lexer and parser. The "StartBlock RootItem" and "EndBlock RootItem" tokens are necessary in terms of making the input file legal, but conceptually I don't need them after I've proven that the file is properly formatted. The only thing that I care about from that point on is that there's a RootItem that contains "foo", as shown here:

Again, I'm painfully new to parser/lexer concepts. Should I (or, is it even possible to) write the grammar so the output tree matches the image above? Or should I take care of that in some subsequent step that traverses the AST and only extracts the relevant data fields?


回答1:


ANTLR 4 produces parse trees, not ASTs. This is an important distinction from the behavior of ANTLR 3, and was chosen to help with long-term maintenance of grammars. In particular, situations may arise where users do want access to the tokens, e.g. as part of a semantic highlighting component in an IDE. Rather than force users to write application-specific modified grammars in such a scenario, we chose to always include all tokens in the parse tree.



来源:https://stackoverflow.com/questions/18768905/antlr-4-should-all-of-these-tokens-be-showing-up-in-the-ast

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!