ANTLR best practice for finding and catching parse errors

若如初见. 提交于 2021-02-11 15:05:38

问题


This question concerns how to get error messages out of an ANTLR4 parser in C# in Visual Studio. I feed the ANTLR parser a known bad input string, but I am not seeing any errors or parse exceptions thrown during the (bad) parse. Thus, my exception handler does not get a chance to create and store any error messages during the parse.

I am working with an ANTLR4 grammar that I know to be correct because I can see correct parse operation outputs in graphical form with an ANTLR extension to Visual Studio Code. I know the generated parser code is correct because I can compile it correctly without errors, override the base visitor class, and print out various bits of information from the parse tree with my overwritten VisitXXX methods.

At this point, I am running a very simple test case that feeds in a bad input string and looks for a nonzero count on my list of stored parse errors. I am confident of the error-handling code because it works in a similar situation on another grammar. But the error-handling code must catch a parse exception to generate an error message. (Maybe that's not the right way to catch/detect parse errors such as unexpected tokens or other errors in the input stream.)

Here is the code that I used to replace the default lexer and parser error listeners.

 // install the custom ErrorListener into the parser object
 sendLexer.RemoveErrorListeners();
 sendLexer.AddErrorListener(MyErrorListener.Instance);
 Parser.RemoveErrorListeners();
 Parser.AddErrorListener(MyErrorListener.Instance);

I have attached a screenshot of the graphical output showing the presence of unexpected tokens in the input string.

Q1. Why don't the unexpected tokens cause parse exceptions that I can catch with my exception handler? Are all parse errors supposed to throw exceptions?

Q2. If catching parse exceptions is not the right way, could someone please suggest a strategy for me to follow to detect the unexpected token errors (or other errors that do not throw parse exceptions)?

Q3. Is there a best practice way of catching or finding parse errors, such as generating errors from walking the parse tree, rather than hoping that ANTLR will throw a parse exception for every unexpected token? (I am wondering if unexpected tokens are supposed to generate parse exceptions, as opposed to producing and legitimate parse tree that happens to contain unexpected tokens? If so, do they just show up as unexpected children in the parse tree?)

Thank you.

Screenshot showing unexpected tokens in the (deliberate) bad input string to trigger errors:

UPDATE:

Currently, the parser and unit tests are working. If I feed a bad input string into the parser, the default parser error listener produces a suitable error message. However, when I install a custom error listener, it never gets called. I don't know why it doesn't get called when I see an error message when the custom error listener is not installed.

I have the parser and unit tests working now. When I inject a bad input string, the default parse error listener prints out a message. But when I install a custom error listener, it never gets called. 1) A breakpoint placed in the error listener never gets hit, and 2) (as a consequence) no error message is collected nor printed.

Here is my C# code for the unit test call to ParseText:

// the unit test
public void ModkeyComboThreeTest() {
  SendKeysHelper.ParseText("this input causes a parse error);
  Assert.AreEqual(0, ParseErrors.Count);


// the helper class that installs the custom error listener
public static class SendKeysHelper {
  public static List<string> ParseErrorList = new List<string>();
  public static MyErrorListener MyErrorListener;

  public static SendKeysParser ParseText(string text) {
    ParseErrors.Clear();
    try {
      var inputStream = new AntlrInputStream(text);
      var sendLexer = new SendKeysLexer(inputStream);
      var commonTokenStream = new CommonTokenStream(sendLexer);
      var sendKeysParser = new SendKeysParser(commonTokenStream);
      Parser = sendKeysParser;

      MyErrorListener = new MyErrorListener(ParseErrorList);
      Parser.RemoveErrorListeners();
      Parser.AddErrorListener(MyErrorListener);

      // parse the input from the starting rule
      var ctx = Parser.toprule();
      if (ParseErrorList.Count > 0) {
        Dprint($"Parse error count: {ParseErrorList.Count}");
      }
 ...
}


// the custom error listener class
public class MyErrorListener : BaseErrorListener, IAntlrErrorListener<int>{
  public List<string> ErrorList { get; private set; }

  // pass in the helper class error list to this constructor
  public MyErrorListener(List<string> errorList) {
    ErrorList = errorList; 
  }

  public void SyntaxError(IRecognizer recognizer, int offendingSymbol, 
    int line, int offset, string msg, RecognitionException e) {
    var errmsg = "Line " + line + ", 0-offset " + offset + ": " + msg;              

    ErrorList.Add(errmsg);
  }
}

So, I'm still trying to answer my original question on how to get error information out of the failed parse. With no syntax errors on installation, 1) the default error message goes away (suggesting my custom error listener was installed), but 2) my custom error listener SyntaxError method does not get called to register an error.

Or, alternatively, I leave the default error listener in place and add my custom error listener as well. In the debugger, I can see both of them registered in the parser data structure. On an error, the default listener gets called, but my custom error listener does not get called (meaning that a breakpoint in the custom listener does not get hit). No syntax errors or operational errors in the unit tests, other than that my custom error listener does not appear to get called.

Maybe the reference to the custom listener is somehow corrupt or not working, even though I can see it in the parser data structure. Or maybe a base class version of my custom listener is being called instead. Very strange.

UPDATE

The helpful discussion/answer for this thread was deleted for some reason. It provided much useful information on writing custom error listeners and error strategies for ANTLR4.

I have opened a second question here ANTLR4 errors not being reported to custom lexer / parser error listeners that suggests an underlying cause for why I can't get error messages out of ANTLR4. But the second question does not address the main question of this post, which is about best practices. I hope the admin who deleted this thread undeletes it to make the best practice information visible again.

来源:https://stackoverflow.com/questions/64627548/antlr-best-practice-for-finding-and-catching-parse-errors

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!