Parsing string interpolation in ANTLR

别来无恙 提交于 2019-11-29 03:59:53

问题


I'm working on a simple string manipulation DSL for internal purposes, and I would like the language to support string interpolation as it is used in Ruby.

For example:

name = "Bob"
msg = "Hello ${name}!"
print(msg)   # prints "Hello Bob!"

I'm attempting to implement my parser in ANTLRv3, but I'm pretty inexperienced with using ANTLR so I'm unsure how to implement this feature. So far, I've specified my string literals in the lexer, but in this case I'll obviously need to handle the interpolation content in the parser.

My current string literal grammar looks like this:

STRINGLITERAL : '"' ( StringEscapeSeq | ~( '\\' | '"' | '\r' | '\n' ) )* '"' ;
fragment StringEscapeSeq : '\\' ( 't' | 'n' | 'r' | '"' | '\\' | '$' | ('0'..'9')) ;

Moving the string literal handling into the parser seems to make everything else stop working as it should. Cursory web searches didn't yield any information. Any suggestions as to how to get started on this?


回答1:


I'm no ANTLR expert, but here's a possible grammar:

grammar Str;

parse
    :    ((Space)* statement (Space)* ';')+ (Space)* EOF
    ;

statement
    :    print | assignment
    ;

print
    :    'print' '(' (Identifier | stringLiteral) ')' 
    ;

assignment
    :    Identifier (Space)* '=' (Space)* stringLiteral
    ;

stringLiteral
    :    '"' (Identifier | EscapeSequence | NormalChar | Space | Interpolation)* '"'
    ;

Interpolation
    :    '${' Identifier '}'
    ;

Identifier
    :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
    ;

EscapeSequence
    :    '\\' SpecialChar
    ;

SpecialChar
    :     '"' | '\\' | '$'
    ;

Space
    :    (' ' | '\t' | '\r' | '\n')
    ;

NormalChar
    :    ~SpecialChar
    ;

As you notice, there are a couple of (Space)*-es inside the example grammar. This is because the stringLiteral is a parser-rule instead of a lexer-rule. Therefor, when tokenizing the source file, the lexer cannot know if a white space is part of a string literal, or is just a space inside the source file that can be ignored.

I tested the example with a little Java class and all worked as expected:

/* the same grammar, but now with a bit of Java code in it */
grammar Str;

@parser::header {
    package antlrdemo;
    import java.util.HashMap;
}

@lexer::header {
    package antlrdemo;
}

@parser::members {
    HashMap<String, String> vars = new HashMap<String, String>();
}

parse
    :    ((Space)* statement (Space)* ';')+ (Space)* EOF
    ;

statement
    :    print | assignment
    ;

print
    :    'print' '(' 
         (    id=Identifier    {System.out.println("> "+vars.get($id.text));} 
         |    st=stringLiteral {System.out.println("> "+$st.value);}
         ) 
         ')' 
    ;

assignment
    :    id=Identifier (Space)* '=' (Space)* st=stringLiteral {vars.put($id.text, $st.value);}
    ;

stringLiteral returns [String value]
    :    '"'
        {StringBuilder b = new StringBuilder();} 
        (    id=Identifier           {b.append($id.text);}
        |    es=EscapeSequence       {b.append($es.text);}
        |    ch=(NormalChar | Space) {b.append($ch.text);}
        |    in=Interpolation        {b.append(vars.get($in.text.substring(2, $in.text.length()-1)));}
        )* 
        '"'
        {$value = b.toString();}
    ;

Interpolation
    :    '${' i=Identifier '}'
    ;

Identifier
    :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
    ;

EscapeSequence
    :    '\\' SpecialChar
    ;

SpecialChar
    :     '"' | '\\' | '$'
    ;

Space
    :    (' ' | '\t' | '\r' | '\n')
    ;

NormalChar
    :    ~SpecialChar
    ;

And a class with a main method to test it all:

package antlrdemo;

import org.antlr.runtime.*;

public class ANTLRDemo {
    public static void main(String[] args) throws RecognitionException {
        String source = "name = \"Bob\";        \n"+
                "msg = \"Hello ${name}\";       \n"+
                "print(msg);                    \n"+
                "print(\"Bye \\${for} now!\");    ";
        ANTLRStringStream in = new ANTLRStringStream(source);
        StrLexer lexer = new StrLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        StrParser parser = new StrParser(tokens);
        parser.parse();
    }
}

which produces the following output:

> Hello Bob
> Bye \${for} now!

Again, I am no expert, but this (at least) gives you a way to solve it.

HTH.



来源:https://stackoverflow.com/questions/1850468/parsing-string-interpolation-in-antlr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!