ANTLR4 skips empty line only

冷暖自知 提交于 2021-02-08 03:33:17

问题


I am using antlr4 parsing a text file and I am new to it. Here is the part of the file:

abcdef
//emptyline
abcdef

In file stream string it will be looked like this:

abcdef\r\n\r\nabcdef\r\n

In terms of ANTLR4, it offers the "skip" method to skip something like white-space, TAB, and new line symbol by regular expression while parsing. i.e.

WS : [\t\s\r\n]+ -> skip ; // skip spaces, tabs, newlines

My problem is that I want to skip the empty line only. I don't want to skip every single "\r\n". Therefore it means when there are two or more "\r\n" appear together, I only want to skip the second one or following ones. How should I write the regular expression? Thank you.

grammar INIGrammar_1;
init: (section|NEWLINE)+ ;

section:  '[' phase_name ':' v ']' (contents)+ 
            | '[' phase_name ']' (contents)+ ; 
//
//
phase_name : STRING
            |MTT
            |MPI_GET
            |MPI_INSTALL
            |MPI_DETAILS
            |TEST_GET
            |TEST_BUILD
            |TEST_RUN
            |REPORTER
            ; 
v  : STRING ;      

contents: kvpairs 
          | include_section_pairs
          | if_statement
          | NEWLINE
          | EOT
          ;

keylhs : STRING
        ;
valuerhs : STRING 
          |multiline_valuerhs
          |kvpairs
          |url
          ;
kvpairs: keylhs '=' valuerhs NEWLINE
        ;
include_section_pairs: INCLUDE_SECTION '=' STRING
                    ;
if_statement: IF if_statement_condition THEN NEWLINE (ELSEIF if_statement_condition THEN NEWLINE)*? STRING NEWLINE IFEND NEWLINE
            ;
if_statement_condition:STRING '=' STRING ';'//here, semicolon has problem, either I use ';' or SEMICOLON
                        ;
multiline_valuerhs:STRING (',' (' ')*? ( '\\' (' ')*? NEWLINE)? STRING)+ 
                    ;
url:(' ')*?'http'':''//''www.';//ignore this, not finished.
IF: 'if';
ELSEIF:'elif';
IFEND:'fi';
THEN: 'then';
SEMICOLON: ';';
STRING : [a-z|A-Z|0-9|''| |.|\-|_|(|)|#|&|""|/|@|<|>|$]+ ;

//Keywords
MTT: 'MTT';
MPI_GET: 'MPI get';
MPI_INSTALL:'MPI install';
MPI_DETAILS:'MPI Details';
TEST_GET:'Test get';
TEST_BUILD: 'Test build';
TEST_RUN: 'Test run';
REPORTER: 'Reporter';
INCLUDE_SECTION: 'include_section';
//INCLUDE_SECTION_VALUE:STRING;
EOT:'EOT';

NEWLINE: ('\r' ? '\n')+ ;
WS : [\t]+ -> skip ; // skip spaces, tabs, newlines
COMMENT: '#' .*? '\r'?'\n' -> skip;
EMPTYLINE: '\r\n' -> skip;

Part of the INI file

#======================================================================
# MPI run details
#======================================================================

[MPI Details: Open MPI]

# MPI tests
#exec = mpirun @hosts@ -np &test_np() @mca@ --prefix &test_prefix() &test_executable() &test_argv()
exec = mpirun @hosts@ -np &test_np() --prefix &test_prefix() &test_executable() &test_argv()

hosts = &if(&have_hostfile(), "--hostfile " . &hostfile(), \
            &if(&have_hostlist(), "--host " . &hostlist(), ""))

One more small thing is, it seems like ";" cannot be indicated as itself in result. The ANTLR4 just keep saying it expects something else and treat the semicolon as unknown symbol.


回答1:


The short answer to your question is that whitespace is not significant to your parser, so skip it all in the lexer.

The longer answer is to recognize that skipping whitespace (or any other character sequence) does not mean that it is not significant in the lexer. All it means is that no corresponding token is produced for consumption by the parser. Skipped whitespace will therefore still operate as a delimiter for generated tokens.

Couple of additional observations:

  1. Antlr does not do regex's - thinking along those lines will lead to further conceptual difficulties.

  2. Don't ignore warnings and errors messages produced in the generation of the Lexer/Parser - they almost always require correction before the generated code will function correctly.

  3. Really helps to verify that the lexer is producing your intended token stream before trying to debug parser rules. See this answer that shows how to dump the token stream.




回答2:


I ran into the same issue trying to have a language that does not require a ; command delimiter. What resolved it for me was adding the new line as a valid parse rule that does nothing. I am no expert on this matter but it worked:

nl : NEWLINE{};

The new line looks like this (no skipping)

NEWLINE:[\r?\n];


来源:https://stackoverflow.com/questions/29519842/antlr4-skips-empty-line-only

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!