can an element contain attribute as parsed by parser generated by ANTLR? if so, how?

前端未结

关注

 3  1868

野性不改 2021-01-26 17:42

I am following this tutorial and successfully replicated its behavior except that I am using Antlr 4.7 instead of the 4.5 that the tutorial was using.

I am trying to bui

3条回答

忘了有多久 (楼主)

2021-01-26 18:19

I am guessing I need to change the todo.g4 and then re generate the parser.

Of course regenerate after each change. For me it's :

$ a4 Question.g4
$ javac Q*.java
$ grun Question elements -tokens -diagnostics t.text

where

$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'

The more you describe specific contents, the more you may face ambiguity problems. For example, you have two rules :

payment   : 'pay' [payee] [amount]
free_text : ... any character ...

Consider the following content :

* pay Federico Tomassetti 10 € for the tutorial

* pay Federico Tomassetti 10 is ambiguous and can be matched by the two rules, but it will finally be parsed as free text, because of € for the tutorial which doesn't satisfy payment.

If later you change the payment rule to accept more info after the amount :

payment   : 'pay' [payee] [amount] payment_info

the above content will be matched by payment (in case of ambiguity ANTLR chooses the first rule). The good news is that ANTLR 4 is very strong to disambiguate, it reads the whole file if necessary.

For ambiguous tokens and precedence rules, read the posts of these last three weeks, a lot have been said.

Mixing Raven's grammar with yours, this is one possible solution :

File Question.g4

grammar Question;

elements
@init {System.out.println("Question last update 1432");}
    : ( element | emptyLine )* EOF
    ;

element
    : '*' content NL
    ;

content
    : payment   //{System.out.println("Payement found " + $payment.text);}
    | free_text {System.out.println("Free text found " + $free_text.text);}
    ;

payment
    : PAY receiver amount=NUMBER
      {System.out.println("Payement found " + $amount.text + " to " + $receiver.text);}
    ;

receiver
    : surname=WORD ( lastname=WORD )?
    ;  

free_text
    : ( WORD | PAY | NUMBER )+
    ;

emptyLine
    : NL
    ;

PAY    : 'pay' ;
WORD   : LETTER ( LETTER | DIGIT | '_' )* ;
NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;  

NL  : [\r\n]
    | '\r\n' 
    ;
//WS  : [ \t]+ -> skip ; // $payment.text => payAcmeCorp123,789.45
WS  : [ \t]+ -> channel(HIDDEN) ; // spaces are needed to nicely display $payment.text

fragment DIGIT  : [0-9] ;
fragment LETTER : [a-zA-Z] ;

File t.text

* play with ANTLR 4
* write a tutorial
* pay Acme Corp 123,789.45
* pay Banana Inc 700
* pay Federico Tomassetti 10 € for the tutorial

Execution :

$ grun Question elements -tokens -diagnostics t.text
line 5:29 token recognition error at: '€'
[@0,0:0='*',<'*'>,1:0]
[@1,1:1=' ',,channel=1,1:1]
[@2,2:5='play',,1:2]
[@3,6:6=' ',,channel=1,1:6]
[@4,7:10='with',,1:7]
[@5,11:11=' ',,channel=1,1:11]
[@6,12:16='ANTLR',,1:12]
[@7,17:17=' ',,channel=1,1:17]
[@8,18:18='4',,1:18]
[@9,19:19='\n',,1:19]
[@10,20:20='*',<'*'>,2:0]
[@11,21:21=' ',,channel=1,2:1]
[@12,22:26='write',,2:2]
[@13,27:27=' ',,channel=1,2:7]
[@14,28:28='a',,2:8]
[@15,29:29=' ',,channel=1,2:9]
[@16,30:37='tutorial',,2:10]
[@17,38:38='\n',,2:18]
...
[@56,136:135='',,7:0]
Question last update 1432
Free text found play with ANTLR 4
Free text found write a tutorial
line 3:26 reportAttemptingFullContext d=2 (content), input='pay Acme Corp 123,789.45
'
...
Payement found 700 to Banana Inc
Free text found pay Federico Tomassetti 10  for the tutorial

As you can see, the € symbol is not recognized. You may need a CONTENT rule similar to FIELDTEXT here, and then you get into trouble ...

Federico's Mega tutorial is a good start. For nitty-gritty details, see The Definitive ANTLR 4 Reference or the online doc from www.antlr.org.

0 讨论(0)

查看其它3个回答