OutOfMemoryError when parsing incorrect input in ANTLR

故事扮演 提交于 2019-12-08 01:54:33

问题


Actually this issue is related with my previous question Catching ANTLR's NoViableAltException in Java and ANTLRWorks Debugger, but I decided to split them because of different symptoms.

The issue is about feeding to ANTLR input text, which contains unknown tokens. Consider for example, that our grammar doesn't known anything about tokens which start with @ symbol. If we will try to feed such text to ANTLRWorks interpreter, we will receive NoViableAltException in result graph.

But if we will take generated and compiled grammar in Java and try to parse such invalid text with it, we can receive one of the following result (it depends on where we will place this unknown token, i.e. how "deeply" we will put it into text):

1) no errors, and null value in chidlren field in top-level CommonTree object (the mentioned question about is exactly about this case);

2) java.lang.OutOfMemoryError: Java heap space error.

This question is about second case. How we could prevent this behaviour of ANTLR parser? For example, in production clients could accidentally crash down a system by providing incorrect char sequence to DSL parser.


回答1:


This generally happens when a lexer contains a rule that can match the empty string. For example, consider the following rule:

WS : (' ' | '\t')*;

This rule can create a WS token containing a total of 0 space and/or tab characters, which means there can be an infinite number of them between any other tokens in your input. During some situations involving invalid input, the error recovery process can be forced into an infinite loop which will buffer tokens until Java runs out of memory.

The first step to solving this situation is examining every lexer rule to make sure this can't happen. The WS should instead be written like this to ensure that at least 1 space and/or tab characters are consumed.

WS : (' ' | '\t')+;

PS: ANTLR 4 performs a static check on the grammar to produce an error (4.0) or warning (4.0.1+) if this occurs.



来源:https://stackoverflow.com/questions/15385650/outofmemoryerror-when-parsing-incorrect-input-in-antlr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!