问题
ANTLR: Is it possible to make grammar with embed grammar (with it's own lexer) inside?
For example in my language I have ability to use embed SQL language:
var Query = [select * from table];
with Query do something ....;
Is it possible with ANTLR?
回答1:
Is it possible to make grammar with embed grammar (with it's own lexer) inside?
If you mean whether it is possible to define two languages in a single grammar (using separate lexers), then the answer is: no, that's not possible.
However, if the question is whether it is possible to parse two languages into a single AST, then the answer is: yes, it is possible.
You simply need to:
- define both languages in their own grammar;
- create a lexer rule in you main grammar that captures the entire input of the embedded language;
- use a rewrite rule that calls a custom method that parses the external AST and inserts it in the main AST using {...}(see theexprrule in the main grammar (MyLanguage.g)).
MyLanguage.g
grammar MyLanguage;
options {
  output=AST;
  ASTLabelType=CommonTree;
}
tokens {
  ROOT;
}
@members {
  private CommonTree parseSQL(String sqlSrc) {
    try {
      MiniSQLLexer lexer = new MiniSQLLexer(new ANTLRStringStream(sqlSrc));
      MiniSQLParser parser = new MiniSQLParser(new CommonTokenStream(lexer));
      return (CommonTree)parser.parse().getTree();
    } catch(Exception e) {
      return new CommonTree(new CommonToken(-1, e.getMessage()));
    }
  }
}
parse
  :  assignment+ EOF -> ^(ROOT assignment+)
  ;
assignment
  :  Var Id '=' expr ';' -> ^('=' Id expr)
  ;
expr
  :  Num
  |  SQL -> {parseSQL($SQL.text)}
  ;
Var   : 'var';
Id    : ('a'..'z' | 'A'..'Z')+;
Num   : '0'..'9'+;
SQL   : '[' ~']'* ']';
Space : ' ' {skip();};
MiniSQL.g
grammar MiniSQL;
options {
  output=AST;
  ASTLabelType=CommonTree;
}
parse
  :  '[' statement ']' EOF -> statement
  ;
statement
  :  select
  ;
select
  :  Select '*' From ID -> ^(Select '*' From ID)
  ;
Select : 'select';
From   : 'from';
ID     : ('a'..'z' | 'A'..'Z')+;
Space  : ' ' {skip();};
Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
  public static void main(String[] args) throws Exception {
    String src = "var Query = [select * from table]; var x = 42;";
    MyLanguageLexer lexer = new MyLanguageLexer(new ANTLRStringStream(src));
    MyLanguageParser parser = new MyLanguageParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}
Run the demo
java -cp antlr-3.3.jar org.antlr.Tool MiniSQL.g 
java -cp antlr-3.3.jar org.antlr.Tool MyLanguage.g 
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
Given the input:
var Query = [select * from table]; var x = 42;
the output of the Main class corresponds to the following AST:
 
And if you want to allow string literals inside your SQL (which could contain ]), and comments (which could contain ' and ]), the you could use the following SQL rule inside your main grammar:
SQL
  :  '[' ( ~(']' | '\'' | '-')
         | '-' ~'-' 
         | COMMENT 
         | STR
         )* 
     ']'
  ;
fragment STR 
  :  '\'' (~('\'' | '\r' | '\n') | '\'\'')+ '\'' 
  |  '\'\''
  ;
fragment COMMENT
  :  '--' ~('\r' | '\n')*
  ;
which would properly parse the following input in a single token:
[
  select a,b,c 
  from table 
  where a='A''B]C' 
  and b='' -- some ] comment ] here'
]
Just beware that trying to create a grammar for an entire SQL dialect (or even a large subset) is no trivial task! You may want to search for existing SQL parsers, or look at the ANTLR wiki for example-grammars.
回答2:
Yes, with AntLR it is called Island grammar. You can get a working example in the v3 examples, inside the island-grammar folder : it shows the usage of a grammar to parse javadoc comments inside of java code.
You can also find some clues in the doc Island Grammars Under Parser Control and that Another one.
来源:https://stackoverflow.com/questions/7750995/antlr-is-it-possible-to-make-grammar-with-embed-grammar-inside