ANTLR API issue ; example + workaround provided ; explanation required

问题

I created the following Lexer using ANTLRWorks. ( See also http://bkiers.blogspot.com/2011/03/2-introduction-to-antlr.html#intro )

 // CSVLexer.g
 lexer grammar CSVLexer;

 @lexer::header {
   package graphica.parsers;
 }

 Comma   
   :  ','  
   ;  

 LineBreak  
  :  '\r'? '\n'  
  |  '\r'  
  ;  

 SimpleValue  
   :  ~(',' | '\r' | '\n' | '"')+  
   ;  

 QuotedValue  
   :  '"' ('""' | ~'"')* '"'  
   ;

I used the following Java class to test the Lexer.

 /**
  *
  * @author Nilo
  */
 import org.antlr.runtime.*;

 public class CSVLexerTest {

 public static void main(String[] args) throws Exception {
    // the input source  
    String source =
            "val1, value2, value3, value3.2" + "\n"
            + "\"line\nbreak\",ABAbb,end";

    // create an instance of the lexer  
    CSVLexer lexer = new CSVLexer(new ANTLRStringStream(source));
    // wrap a token-stream around the lexer  
    CommonTokenStream tokens = new CommonTokenStream(lexer);


    // traverse the tokens and print them to see if the correct tokens are created
    // tokens.toString();
    int n = 1;
    for (Object o : tokens.getTokens()) {
        CommonToken token = (CommonToken) o;
        System.out.println("token(" + n + ") = " + token.getText().replace("\n",    "\\n"));
        n++;
    }
 }
 }

The class above ( from the same tutorial ) does NOT produce any output. If I however insert a tokens.toString() prior to the token loop then output is printed as expected.

Note: I use ANTLWorks 1.4.3, ANTLR 3.4, on Windows 7 with JDK 1.7/64bit

QUESTION: I don't understand this. Please explain. There should be a way to get this working without the tokens.toString()

回答1:

CommonTokenStream extends BufferedTokenStream which has a List<Token> tokens that is returned when one calls getTokens(). But this List<Token> tokens only gets filled at certain times. In 3.3 and 3.4 it does not happen after getTokens() where 3.2 does fill the tokens list.

ANTLR 3.2 (and before)

public List getTokens() {
    if ( p == -1 ) {
        fillBuffer();
    }
    return tokens;
}

protected void fillBuffer() {
    // fill `tokens`
}

ANTLR 3.3 (and after)

public List getTokens() { 
    return tokens; 
}

public void fill() {
    // fill `tokens`
}

Notice how 3.2's fill method is protected and in 3.3+ it is public, so the following works:

import org.antlr.runtime.*;

public class CSVLexerTest {

  public static void main(String[] args) throws Exception {

    // the input source  
    String source =
        "val1, value2, value3, value3.2" + "\n" + 
        "\"line\nbreak\",ABAbb,end";

    // create an instance of the lexer  
    CSVLexer lexer = new CSVLexer(new ANTLRStringStream(source));

    // wrap a token-stream around the lexer and fill the tokens-list 
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    tokens.fill();

    // traverse the tokens and print them to see if the correct tokens are created
    // tokens.toString();
    int n = 1;
    for (Object o : tokens.getTokens()) {
      CommonToken token = (CommonToken) o;
      System.out.println("token(" + n + ") = " + token.getText().replace("\n",    "\\n"));
      n++;
    }
  }
}

producing the output:

token(1) = val1
token(2) = ,
token(3) =  value2
token(4) = ,
token(5) =  value3
token(6) = ,
token(7) =  value3.2
token(8) = \n
token(9) = "line\nbreak"
token(10) = ,
token(11) = ABAbb
token(12) = ,
token(13) = end
token(14) = <EOF>

来源：https://stackoverflow.com/questions/8560556/antlr-api-issue-example-workaround-provided-explanation-required

标签

java

antlr