antlr | 易学教程

Antlr Lexer exclude a certain pattern

阅读更多关于 Antlr Lexer exclude a certain pattern

问题 In Antlr Lexer, How can I achieve parsing a token like this: A word that contains any non-space letter but not '.{' inside it. Best I can come up with is using a semantics predicate. WORD: WL+ {!getText().contains(".{")}; WL: ~[ \n\r\t]; I'm a bit worried to use semantics predicate though cause WORD here will be lexed millions of times I would think to put a semantics predicate will hit the performance. This is coming from the requirement that I need to parse something like: TOKEN_ONE.{TOKEN

Handling scope for single and double quote strings in ANTLR4

阅读更多关于 Handling scope for single and double quote strings in ANTLR4

问题 I am working with ANTLR4 and in the process of writing grammar to handle single and double quoted strings. I am trying to use Lexer modes to scope the strings but that is not working out for me, my grammar is listed below. Is this the right way or how can I properly parse these as tokens instead of parser rules with context. Any insight? An example: 'single quote that contain "a double quote 'that has another single quote'"' Lexer Grammar lexer grammar StringLexer; fragment SQUOTE: '\'';

How to split input according to the grammar

阅读更多关于 How to split input according to the grammar

问题 We are trying to build a parser for log file generated in the router. We successfully build that and able to print the valid language in particular file. But if the input is not valid according to the grammar, then we want to print it in the different file. We tried something and it's not working properly. Can you please suggest the way by which we can do it? And if possible, kindly give the working example. This is what we have tried. We are not using any specific IDE, just a text editor.

Why does parser generated by ANTLR reuse context objects?

阅读更多关于 Why does parser generated by ANTLR reuse context objects?

问题 I'm trying to create an interpreter for a simple programming language using ANTLR. I would like to add the feature of recursion. So far I have implemented the definition and calling functions with option of using several return statements and also local variables. To achieve having local variables I extended the parser partial class of FunctionCallContext with a dictionary for them. I can successfully use them for one time. Also, when I call the same function again from itself (recursively),

Generated Antlr Parser in Java: Not all inputs are read

阅读更多关于 Generated Antlr Parser in Java: Not all inputs are read

问题 I am working on my Antlr grammar to parse polynomial functions in multiple variables using Java. Examples for legal input are 42; X; +42X; Y^42; 1337HelloWorld; 13,37X^42; The following grammar does compile without warnings or errors: grammar Function; parseFunction returns [java.util.List<java.util.List<Object>> list] : { list = new java.util.ArrayList(); } ( f=functionPart { list.add($f.list); } )+ | { list = new java.util.ArrayList(); } ( fb=functionBegin ) { list.add($fb.list); } ( f

Mediawiki parsing in ANTLR: processing ' tokens

阅读更多关于 Mediawiki parsing in ANTLR: processing ' tokens

问题 I'm trying to write a grammar to parse Media wiki's wiki syntax, and after this the Creole syntax too (unfortunately an existing Creole grammar doesn't work in Antlr 3). My issue right now is being able to capture a bold rule when I'm already inside an italic rule, or visa versa. For example '' this text is bold '''now it's italic''' and just bold again'' I've got a lot of help from this question but I'm stuck. The goal is to produce HTML inside the grammar using actions, or possibly an AST -

Implementing free form query language in Java

阅读更多关于 Implementing free form query language in Java

问题 I am working on an api which will take a search string like "(A < 5) & (B = xyz*)" and respond with the correct result. This custom query language is predefined and I can't change it. I know that I can specify the CFG for this expression and then use ANTRL to construct the parse tree and implement this feature. Is there an easier and efficient way to do this in Java? Is there any library that can handle this kind of generic query language? 回答1: Parsers generated by Antlr are very efficient.

ANTLR parse strings (keep whitespaces) and parse normal identifiers

阅读更多关于 ANTLR parse strings (keep whitespaces) and parse normal identifiers

问题 I am trying to use ANTLR4 to parse source files. One thing I need to do is that a string literal contains all kinds of characters and possibly white spaces while normal identifiers contains only English characters and digits (white spaces are thrown away). I use the following antlr grammar rules (the minimal example), but it doesn't work as expected. grammar parseString; rules : stringRule+ ; stringRule : formatString | idString ; formatString : STRING_DOUBLEQUOTE STRING STRING_DOUBLEQUOTE ;

How to use similar lexers

阅读更多关于 How to use similar lexers

问题 I have the following grammar: cmds : cmd+ ; cmd : include_cmd | other_cmd ; include_cmd : INCLUDE DOUBLE_QUOTE FILE_NAME DOUBLE_QUOTE ; other_cmd : CMD_NAME ARG+ ; INCLUDE : '#include' ; DOUBLE_QUOTE : '"' ; CMD_NAME : ('a'..'z')* ; ARG : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')+ ; FILE_NAME : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '.')+ ; So the difference between CMD_NAME, ARG and FILE_NAME is not large, CMD_NAME must be lower case letters, ARG can have upper case letter and "_" and FILE

How to create a lexical analyzer in ANTLR 4 that can catch different types of lexical errors

阅读更多关于 How to create a lexical analyzer in ANTLR 4 that can catch different types of lexical errors

问题 I am using ANTLR 4 to create my lexer, but I don't how to create a lexical analyzer that catches different types of lexical errors. For example: If I have an unrecognized symbol like ^ the lexical analyzer should a report an error like this "Unrecognized symbol "^" " If I have an invalid identifier like 2n the lexical analyzer should report an error like this "identifier "2n" must begin with a letter" Please can you help me. 回答1: Create an error token rule for each known error and an