resolving logical operations - AND, OR, looping conditions dynamically

问题

I have an incoming records filter stored with the logical clause as given below.

Acct1 = 'Y' AND Acct2 = 'N' AND Acct3 = 'N' AND Acct4 = 'N' AND Acct5 = 'N' AND ((Acct6 = 'N' OR Acct7 = 'N' AND Acct1 = 'Y') AND Formatted= 'N' AND Acct9 = 'N' AND (Acct10 = 'N' AND Acct11 = 'N') AND EditableField= 'N' )

My data input to this clause will be from Csv file as below.

Country,Type,Usage,Acct1,Acct2,Acct3,Acct4,Acct5,Acct6,Acct7,Formatted,Acct9,Acct10,Acct11,EditableField
USA,Premium,Corporate,Y,N,Y,N,N,N,Y,N,Y,N,Y,N,
Mexico,Premium,Corporate,Y,N,Y,N,Y,N,Y,N,Y,N,Y,N,
USA,Premium,Corporate,Y,N,Y,N,N,N,N,Y,Y,N,Y,N,
USA,Premium,Corporate,Y,N,Y,N,Y,N,Y,Y,Y,N,Y,N,

I will have to filter out the records in the file based on the conditions defined in the clause. This is a example of one simple clause but there will be more inner conditions than this and the clause can be changed whenever the user want and there will be 10 such clauses the records has to pass through sequentially.

So I am looking for a way to dynamically interpret the clause and apply it on the incoming records. Please provide me your suggestions about how to design/ any example if available.

回答1:

Here's the complete solution which does not include third-party libraries like ANTLR or JavaCC. Note that while it's extensible, its capabilities are still limited. If you want to create much more complex expressions, you'd better use grammar generator.

First, let's write a tokenizer which splits the input string to the tokens. Here's the token types:

private static enum TokenType {
    WHITESPACE, AND, OR, EQUALS, LEFT_PAREN, RIGHT_PAREN, IDENTIFIER, LITERAL, EOF
}

The token class itself:

private static class Token {
    final TokenType type;
    final int start; // start position in input (for error reporting)
    final String data; // payload

    public Token(TokenType type, int start, String data) {
        this.type = type;
        this.start = start;
        this.data = data;
    }

    @Override
    public String toString() {
        return type + "[" + data + "]";
    }
}

To simplify the tokenization let's create a regexp which reads the next token from the input string:

private static final Pattern TOKENS = 
        Pattern.compile("(\\s+)|(AND)|(OR)|(=)|(\\()|(\\))|(\\w+)|\'([^\']+)\'");

Note that it has many groups, one group per TokenType in the same order (first comes WHITESPACE, then AND and so on). Finally the tokenizer method:

private static TokenStream tokenize(String input) throws ParseException {
    Matcher matcher = TOKENS.matcher(input);
    List<Token> tokens = new ArrayList<>();
    int offset = 0;
    TokenType[] types = TokenType.values();
    while (offset != input.length()) {
        if (!matcher.find() || matcher.start() != offset) {
            throw new ParseException("Unexpected token at " + offset, offset);
        }
        for (int i = 0; i < types.length; i++) {
            if (matcher.group(i + 1) != null) {
                if (types[i] != TokenType.WHITESPACE)
                    tokens.add(new Token(types[i], offset, matcher.group(i + 1)));
                break;
            }
        }
        offset = matcher.end();
    }
    tokens.add(new Token(TokenType.EOF, input.length(), ""));
    return new TokenStream(tokens);
}

I'm using java.text.ParseException. Here we apply the regex Matcher till the end of the input. If it doesn't match at the current position, we throw an exception. Otherwise we look for found matching group and create a token from it ignoring the WHITESPACE tokens. Finally we add a EOF token which indicates the end of the input. The result is returned as special TokenStream object. Here's the TokenStream class which will help us to do the parsing:

private static class TokenStream {
    final List<Token> tokens;
    int offset = 0;

    public TokenStream(List<Token> tokens) {
        this.tokens = tokens;
    }

    // consume next token of given type (throw exception if type differs)
    public Token consume(TokenType type) throws ParseException {
        Token token = tokens.get(offset++);
        if (token.type != type) {
            throw new ParseException("Unexpected token at " + token.start
                    + ": " + token + " (was looking for " + type + ")",
                    token.start);
        }
        return token;
    }

    // consume token of given type (return null and don't advance if type differs)
    public Token consumeIf(TokenType type) {
        Token token = tokens.get(offset);
        if (token.type == type) {
            offset++;
            return token;
        }
        return null;
    }

    @Override
    public String toString() {
        return tokens.toString();
    }
}

So we have a tokenizer, hoorah. You can test it right now using System.out.println(tokenize("Acct1 = 'Y' AND (Acct2 = 'N' OR Acct3 = 'N')"));

Now let's write the parser which will create the tree-like representation of our expression. First the interface Expr for all the tree nodes:

public interface Expr {
    public boolean evaluate(Map<String, String> data);
}

Its only method used to evaluate the expression for given data set and return true if data set matches.

The most basic expression is the EqualsExpr which is like Acct1 = 'Y' or 'Y' = Acct1:

private static class EqualsExpr implements Expr {
    private final String identifier, literal;

    public EqualsExpr(TokenStream stream) throws ParseException {
        Token token = stream.consumeIf(TokenType.IDENTIFIER);
        if(token != null) {
            this.identifier = token.data;
            stream.consume(TokenType.EQUALS);
            this.literal = stream.consume(TokenType.LITERAL).data;
        } else {
            this.literal = stream.consume(TokenType.LITERAL).data;
            stream.consume(TokenType.EQUALS);
            this.identifier = stream.consume(TokenType.IDENTIFIER).data;
        }
    }

    @Override
    public String toString() {
        return identifier+"='"+literal+"'";
    }

    @Override
    public boolean evaluate(Map<String, String> data) {
        return literal.equals(data.get(identifier));
    }
}

The toString() method is just for information, you can remove it.

Next we will define the SubExpr class which is either EqualsExpr or something more complex in parentheses (if we see the parenthesis):

private static class SubExpr implements Expr {
    private final Expr child;

    public SubExpr(TokenStream stream) throws ParseException {
        if(stream.consumeIf(TokenType.LEFT_PAREN) != null) {
            child = new OrExpr(stream);
            stream.consume(TokenType.RIGHT_PAREN);
        } else {
            child = new EqualsExpr(stream);
        }
    }

    @Override
    public String toString() {
        return "("+child+")";
    }

    @Override
    public boolean evaluate(Map<String, String> data) {
        return child.evaluate(data);
    }
}

Next is AndExpr which is a set of SubExpr expressions joined by AND operator:

private static class AndExpr implements Expr {
    private final List<Expr> children = new ArrayList<>();

    public AndExpr(TokenStream stream) throws ParseException {
        do {
            children.add(new SubExpr(stream));
        } while(stream.consumeIf(TokenType.AND) != null);
    }

    @Override
    public String toString() {
        return children.stream().map(Object::toString).collect(Collectors.joining(" AND "));
    }

    @Override
    public boolean evaluate(Map<String, String> data) {
        for(Expr child : children) {
            if(!child.evaluate(data))
                return false;
        }
        return true;
    }
}

I use Java-8 Stream API in the toString for brevity. If you cannot use Java-8, you may rewrite it with the for loop or remove toString completely.

Finally we define OrExpr which is a set of AndExpr joined by OR (usually OR has lower priority than AND). It's very similar to AndExpr:

private static class OrExpr implements Expr {
    private final List<Expr> children = new ArrayList<>();

    public OrExpr(TokenStream stream) throws ParseException {
        do {
            children.add(new AndExpr(stream));
        } while(stream.consumeIf(TokenType.OR) != null);
    }

    @Override
    public String toString() {
        return children.stream().map(Object::toString).collect(Collectors.joining(" OR "));
    }

    @Override
    public boolean evaluate(Map<String, String> data) {
        for(Expr child : children) {
            if(child.evaluate(data))
                return true;
        }
        return false;
    }
}

And the final parse method:

public static Expr parse(TokenStream stream) throws ParseException {
    OrExpr expr = new OrExpr(stream);
    stream.consume(TokenType.EOF); // ensure that we parsed the whole input
    return expr;
}

So you can parse your expressions to get the Expr objects, then evaluate them against the rows of your CSV file. I assume that you're capable to parse the CSV row into the Map<String, String>. Here's usage example:

Map<String, String> data = new HashMap<>();
data.put("Acct1", "Y");
data.put("Acct2", "N");
data.put("Acct3", "Y");
data.put("Acct4", "N");

Expr expr = parse(tokenize("Acct1 = 'Y' AND (Acct2 = 'Y' OR Acct3 = 'Y')"));
System.out.println(expr.evaluate(data)); // true
expr = parse(tokenize("Acct1 = 'N' OR 'Y' = Acct2 AND Acct3 = 'Y'"));
System.out.println(expr.evaluate(data)); // false

回答2:

I don't know how efficient this will be in Java, but basic string-replace operations could be a simple solution for this.

You start with a query string:

Acct1 = 'Y' AND Acct2 = 'N' AND Acct3 = 'Y' AND Acct4 = 'N' AND Acct5 = 'N' OR ((Acct6 = 'N' OR Acct7 = 'N') AND Acct8 = 'N' AND Acct9 = 'Y' AND (Acct10 = 'N' OR Acct11 = 'N') AND Acct12 = 'N')

For each line in the csv, e.g. Y,N,Y,N,Y,N,Y,N,Y,N,Y,N string-replace the column headers in the query by the values; that gives you:

Y = 'Y' AND N = 'N' AND Y = 'Y' AND N = 'N' AND Y = 'N' OR ((N = 'N' OR Y = 'N') AND N = 'N' AND Y = 'Y' AND (N = 'N' OR Y = 'N') AND N = 'N')

Then replace the comparisons by their boolean value:
- replace N = 'N' and Y = 'Y' by Y
- replace N = 'Y' and Y = 'N' by N

This will result in:

Y AND Y AND Y AND Y AND N OR ((Y OR N) AND Y AND Y AND (Y OR N) AND Y)

Then loop through a number of string-replace operations which replace truthy values by Y and falsey values by N:
- replace Y AND Y by Y
- replace N AND N, N AND Y and Y AND N, by N
- replace Y OR Y, N OR Y and Y OR N, by Y
- replace N OR N by N
- replace (N) by N
- replace (Y) by Y

This will gradually reduce the boolean statement:

Y AND Y AND Y AND Y AND N OR ((Y OR N) AND Y AND Y AND (Y OR N) AND Y)
Y AND Y AND N OR ((Y) AND Y AND (Y) AND Y)
Y AND N OR ( Y AND Y AND Y AND Y)
N OR ( Y AND Y )
N OR ( Y )
Y

If the queries include implicit precedences without brackets, like N AND N OR Y AND Y where you want AND to have precedence over OR, always exhaust the possibilities to replace AND and brackets before replacing OR:

while (string length decreases) {
    while (string length decreases) {
        replace every "(Z)" by "Z"
        replace every "X AND Y" by "Z"
    }
    replace one "X OR Y" by "Z"
}

During this reduction, make sure to check whether the string length has decreased after every iteration, to avoid infinite loops caused by malformed queries.

回答3:

Hint:

A possible solution is to store your Boolean condition values in a single string attribute, like "YNYNNNYNYNYN", or , better, packed as a binary integer. Then, for a given clause, generate a table of all accepted strings. A join operation will return all desired records.

You can even process several clauses in a single go by adjoining the clause number to the accepted strings when generating the table.

Even though the table size can be exponential in the number of conditions, this can remain quite manageable for a moderate number of conditions.

回答4:

What you have is an expression written in some language that seems compliant with the grammar of the WHERE clause of SQL. So you need:

a parser for this language that can build an AST and then a tree of expressions
an engine to evaluate the expressions tree against your context (ie the environment against the names Acct1, Acct2 etc are resolved against)

This is a simple language so you can build your parser by hand, or otherwise look at ANTLR or JavaCC - and in this case I suggest you take a look at some sample (ANTLR or JavaCC) - of course, you don't need a full SQL parser! Just extract the bits you need.

An easier approach is to write the filter expression in some language that can be invoked via the Java scripting interface, like Javascript or Groovy (or Ruby, Python...). I don't suggest running a find/replace on the input text to transform the SQL-like language to the target language (for example Python has and and or operators - lowercase) as that would break easily depending on the content of the input string.

来源：https://stackoverflow.com/questions/32193036/resolving-logical-operations-and-or-looping-conditions-dynamically

标签

java

algorithm

parsing

recursive-descent