How to split input according to the grammar

问题

We are trying to build a parser for log file generated in the router. We successfully build that and able to print the valid language in particular file.

But if the input is not valid according to the grammar, then we want to print it in the different file. We tried something and it's not working properly. Can you please suggest the way by which we can do it? And if possible, kindly give the working example.

This is what we have tried.

We are not using any specific IDE, just a text editor. vANTLR-4.5

Our input: (input.txt)

Dec 24 15:38:13 103.199.144.14 firewall,info NAT: src-nat2 srcnat: in:(none) out:ether1-WAN, proto TCP (SYN), 10.20.114.212:59559->86.96.88.147:6882, len 52
Dec 24 15:38:13 103.199.144.14 firewall,info src-nat2: forward: in:<pppoe-PDR242> out:ether1-WAN, proto TCP (SYN), 10.20.124.8:50055->111.111.111.111:80, len 52

Where the first line is invalid language. And shouldn't pass through the grammar, and hence must print into failure.txt, But is partially printing in the success.txt file.

Whereas the second line is valid, and is printing properly in the success.txt file as shown in the output file shown below.

Output, that we are getting: (success.txt)

Dec 24 15:38:13, 103.199.144.14, .20.114.212, len, 52, , null
Dec 24 15:38:13, 103.199.144.14, pppoe-PDR242, TCP, 10.20.124.8:50055, 111.111.111.111:80, null

Grammar, we are using:(sys.g)

grammar sys;

r: IDENT NUM time ip x+ user xout proto xuser ipfull xtra ipfull1 xtra1 (xipfull xtra ipfull2 xtra2 xipfull xtra3)*; 
time: NUM COLN NUM COLN NUM;
ip: NUM DOT NUM DOT NUM DOT NUM ;
ipfull: NUM DOT NUM DOT NUM DOT NUM COLN NUM ;
ipfull1: NUM DOT NUM DOT NUM DOT NUM COLN NUM ;
ipfull2: NUM DOT NUM DOT NUM DOT NUM COLN NUM ;
xipfull: NUM DOT NUM DOT NUM DOT NUM COLN NUM ;

x: (IDENT | COMMA | COLN | BRAC | HYPHN | NUM)+ LTHAN;
user: (IDENT | HYPHN | DOT | NUM)+ ;
xout: GTHAN IDENT+ COLN IDENT+ HYPHN IDENT+ (DOT IDENT)* COMMA IDENT;
proto: IDENT ;
xuser: (IDENT | BRAC | COMMA)+ ;
xtra: HYPHN GTHAN ;
xtra1: COMMA IDENT (BRAC | NUM);
xtra2: BRAC xtra;
xtra3: COMMA IDENT NUM;

IDENT: ('a'..'z' | 'A'..'Z')('a'..'z' | 'A'..'Z' | '0'..'9')* ;
NUM: ('0'..'9')+ ;
LTHAN: '<' ;
GTHAN: '>' ;
COLN: ':';
COMMA: ',';
BRAC: '(' | ')' ;
HYPHN: '-';
DOT: '.';
WS : (' ' | '\t' | '\r' | '\n')+ -> skip ;

Our main class where we are using Parser and lexer generated by grammar.

import org.antlr.v4.runtime.ANTLRFileStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import java.io.*;
import org.antlr.v4.runtime.*;

public class SysLogCheck {
    public static void main(String[] args) throws Exception {

        long startTime = System.currentTimeMillis();

        BufferedReader br = new BufferedReader(new FileReader("test123.txt"));
        String s = null;
        //FileWriter out = new FileWriter("abc.txt");
        PrintWriter success = new PrintWriter(new FileWriter("success.csv"));
        PrintWriter failure = new PrintWriter(new FileWriter("failure.csv"));
        while((s=br.readLine())!=null)
        {
            ANTLRInputStream input = new ANTLRInputStream(s);
            sysLexer lexer = new sysLexer(input);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            sysParser parser = new sysParser(tokens);
            ParseTree tree = parser.r();
            EvalVisitor visitor = new EvalVisitor();
            if((visitor.visit(tree)).equals("failure")) // here visit method of EvalVisitor class returns "failure" then the content should be written 
                                                        //in failure file and else it should be written in success file 
                                                        // but this is not working
            {
                failure.println(s);
            }
            else
            {
                success.println(visitor.visit(tree));
            }
        }
        failure.flush();
        failure.close();
        success.flush();
        success.close();

        long stopTime = System.currentTimeMillis();
        long elapsedTime = stopTime - startTime;

        System.out.println(elapsedTime);
    }
}

Our EvalVisitor (main visitor class)code:

import org.antlr.v4.runtime.tree.ParseTree;
import java.io.*;

public class EvalVisitor extends sysBaseVisitor
{
        class LogEntry {
        String ident1;
        String dayNum;
        String time;
        String ip;
        String ipfull;
        String user;
        String proto;
        String ipfull1;
        String ipfull2;
        String x;

      }


      static LogEntry logEntry;

      @Override
      public Object visit(ParseTree tree) {
        /* Setup logentry used by all visitors (this case, there is only a single visitor...)*/
        logEntry = new LogEntry();

        final Object o = super.visit(tree);

        //our logic to check whether our input contains "<" or not
        if((logEntry.x).contains("<") )
        {
            return logEntry.ident1 +" " + logEntry.dayNum + " " + logEntry.time+ ", " + logEntry.ip+ ", " + logEntry.user+ ", " + logEntry.proto+ ", " + logEntry.ipfull+ ", " + logEntry.ipfull1+ ", " + logEntry.ipfull2;
        }       
            return "failure"; //else return failure
      }

      StringBuilder stringBuilder;



      @Override
      public Object visitR(sysParser.RContext ctx) {
        logEntry.ident1 = ctx.IDENT().getText();
        logEntry.dayNum = ctx.NUM().getText();
        return super.visitR(ctx);
      }

      @Override
      public Object visitTime(sysParser.TimeContext ctx) {
        logEntry.time = ctx.getText();
        return super.visitTime(ctx);
      }

      @Override
      public Object visitIp(sysParser.IpContext ctx) {
        logEntry.ip = ctx.getText();
        return super.visitIp(ctx);
      }

      @Override
      public Object visitIpfull(sysParser.IpfullContext ctx) {
        logEntry.ipfull = ctx.getText();
        return super.visitIpfull(ctx);
      }

      @Override
      public Object visitIpfull1(sysParser.Ipfull1Context ctx) {
        logEntry.ipfull1 = ctx.getText();
        return super.visitIpfull1(ctx);
      }

      @Override
      public Object visitIpfull2(sysParser.Ipfull2Context ctx) {
        logEntry.ipfull2 = ctx.getText();
        return super.visitIpfull2(ctx);
      }

      @Override
      public Object visitXipfull(sysParser.XipfullContext ctx) {
        return super.visitXipfull(ctx);
      }

      @Override
      public Object visitX(sysParser.XContext ctx) {
        logEntry.x = ctx.getText();
        return super.visitX(ctx);
      }

      @Override
      public Object visitUser(sysParser.UserContext ctx) {
        logEntry.user = ctx.getText();
        return super.visitUser(ctx);
      }

      @Override
      public Object visitXuser(sysParser.XuserContext ctx) {
        return super.visitXuser(ctx);
      }

      @Override
      public Object visitXout(sysParser.XoutContext ctx) {
        return super.visitXout(ctx);
      }

      @Override
      public Object visitProto(sysParser.ProtoContext ctx) {
        logEntry.proto = ctx.getText();
        return super.visitProto(ctx);
      }

      @Override
      public Object visitXtra(sysParser.XtraContext ctx) {
        return super.visitXtra(ctx);
      }

      @Override
      public Object visitXtra1(sysParser.Xtra1Context ctx) {
        return super.visitXtra1(ctx);
      }

      @Override
      public Object visitXtra2(sysParser.Xtra2Context ctx) {
        return super.visitXtra2(ctx);
      }

      @Override
      public Object visitXtra3(sysParser.Xtra3Context ctx) {
        return super.visitXtra3(ctx);
      }   

 }

回答1:

If all you're trying to do is create a file with data from the lines you consider valid, then ANTLR is probably overkill (I mentioned this in the mailing list thread). I'll assume here that you may want to do more with the parsed results (or that you just really want to use ANTLR for this)

I see that you're already parsing each input line individually.

It appears that your 'r' parser rule recognizes valid as well as "invalid" lines. I'd suggest tightening up the grammar to define what you consider to be a valid line. If your grammar only accepts (i.e. "recognizes") valid lines, then any invalid line will throw a RecognitionException.

You don't mention what makes line 2 valid and line 1 invalid, so I can't really make a recommendation on how to correct your 'r' rule.

(There's a lot to critique about your grammar, and it indicates that you're trying to learn "just enough" ANTLR to get by. I don't think you're asking for a full critique of your grammar, so I'll skip the details.)

After examination of your code, it appears that you're just wanting to identify log lines of a particular type, and to capture data from those lines. If that's what you're trying to accomplish, then look into Java Regular expressions and capture groups. It'll be a lot simpler than using ANTLR (and I'm a pretty big fan of ANTLR).

来源：https://stackoverflow.com/questions/34453783/how-to-split-input-according-to-the-grammar

标签

java

parsing

antlr

antlr4