Apply distributive law on AST (or RPN) => disjunctive normal form

可紊 提交于 2021-01-27 20:32:52

问题


I have expressions like the following:

{1000} AND ({1001} OR {1002} OR {1003})

Allowed operators are OR and AND, expressions can be nested using parenthesis. I already managed to tokenize this string and to convert it to an abstract syntax tree (AST) using the Shunting Yard algorithm, implemented in PHP 5.3. The above expression results in the following:

1000 1001 1002 | 1003 | &


    &
  /   \
1000   |
      / \
     |   1003
    / \
1001  1002

When traversing this tree I want to output the final combinations of numbers a user can choose from. In the given representation this is not possible. What I need is actually the form, after the distributive law was applied:

(1000 & 1001) | (1000 & 1002) | (1000 & 1003)

1000 1001 & 1000 1002 & | 1000 1003 & |

               _______________|_____________
              /                             \
      _______|____                           &
     /            \                         / \
    &              &                    1000   1003
  /   \           / \
1000   1001    1000  1002

I concluded, that the only nodes that are allowed to be &-operator nodes, are the last ones that carry the leafs. All others have to be |-operator nodes.

How to convert an arbitrary AST with the grammar explained above to one that represents all final permutations? Is it better to apply the distributive law on the tokens of the infix representation? Is it easier to work with the RPN representation instead of the tree?

Please also note, that there are more difficult examples possible like:

(1000 & 1008) & (1001 | 1002 | 1003)
1000 1008 & 1001 1002 | 1003 | &
       ______ & ___
      /            \
     &             |
    / \           / \
1000   1008      |   1003
                / \
            1001  1002

Which I'd like to result in:

(1000 & 1008 & 1001) | (1000 & 1008 & 1002) | (1000 & 1008 & 1003)
1000 1008 & 1001 & 1000 1008 & 1002 & | 1000 1008 & 1003 & |

                        __________________|_________
                       /                            \
         _____________|_________                     &
        /                       \                   / \
       &                        &                  &   1003
      /  \                     / \                / \
     &    1001                &   1002        1000   1008
    / \                      / \
1000   1008              1000   1008

For another (more complicated) example just switch left sub tree and right sub tree or add another &-node in place of 1003 => 1003 1009 &

What I already tried: Googling a lot, traversing the tree pre and post order, trying to find an algorithm with no success.

I am grateful for any hints and pointers into the right direction.


回答1:


What you seem to want to do is produce is disjunctive normal form. This is harder to do than it looks because there are lots of interesting cases to handle.

What you want to do is implement the following rewrite rule, exhaustively, everywhere in your tree (actually, leaf upwards is probably good enough):

 rule distribute_and_over_or(a: term, b: term, c: term): term->term
    "  \a and (\b or \c) " ->  " \a and \b or \a and \c ";

In complex terms, you'll get redundant subterms, so you'll likely need these rules:

 rule subsumption_identical_or_terms:(a: term): term->term
    "  \a or \a " ->  \a";

 rule subsumption_identical_and_terms:(a: term): term->term
    "  \a and \a " ->  \a";

The way you expressed your problem, you didn't use "not" but it will likely show up, so you need the following additional rules:

 rule cancel_nots:(term: x): term -> term
    " not (not \x)) " -->  "\x";

rule distribute_not_over_or(a: term, b: term): term->term
    " not( \a or \b ) " ->  " not \a  and not \b ";

 rule distribute_not_over_and(a: term, b: term): term->term
    " not( \a and \b ) " ->  " not \a  or not \b ";

You may also encounter self-cancelling terms, so you need to handle those:

 rule self_cancel_and(a: term): term->term
     "  \a and not \a " -> "false";

 rule self_cancel_or(a: term): term->term
     "  \a or not \a " -> "true";

and ways to get rid of true and false:

 rule and_true(a: term): term->term
     " \a and true " -> " \a ";

 rule and_false(a: term): term->term
     " \a and false " -> " false ";

 rule or_true(a: term): term->term
     " \a or true " -> " true ";

 rule and_false(a: term): term->term
     " \a or false " -> " \a ";

 rule not_false(a: term): term->term
     " not false " -> " true ";

 rule not_true(a: term): term->term
     " not true " -> " false ";

(I've assumed expression precedence with "not" binding tighter than "and" binding tighter than "or").

The rules shown assume the various subtrees are at best "binary", but they may have many actual children, as you show in your examples. In effect, you have to worry about the associative law, too. You'll also have to take into account the commutative law if you want the the subsumption and cancellation laws to really work.

You'll probably discover some implicit "not" propagation if your sub-expressions contain relational operators, e.g.,

    " not ( x > y ) " -->  " x <= y "

You may also wish to normalize your relational compares:

    "  x < y " -->  " not (x >= y )"

Since you've implemented your trees in PHP, you'll have to manually code the equivalent of these by climbing up and down the trees procedurally. This is possible but pretty inconvenient. (You can do this on both tokens-as-RPN and on ASTs, but I think you'll find it much easier on ASTs because you don't have to shuffle strings-of-tokens).

It is easier, when manipulating symbolic formulas, to apply an engine, typically a program transformation system, that will accept the rewrites directly and apply them for you. The notation I used here is taken from our DMS Software Reengineering Toolkit, which takes these rules directly and handles associativity and commutativity automatically. This is probably not a workable choice inside PHP.

One last issue: if your terms have any complexity, the final disjunctive normal form can get pretty big, pretty fast. We had a client that wanted exactly this, until we gave it to him on a big starting term, which happened to produce hundreds of leaf conjunctions. (So far, we've not found a pretty way to present arbitrary boolean terms.)




回答2:


Thanks for mentioning the keyword which helped me the most: Disjunctive normal form. I was not aware of actually looking for this transformation.

I could not find a detailed algorithm description on the internet so I tried to do it by myself. This is how I have done it in pseudo code. Please tell me, if it's not comprehensible.

- Traverse the AST recursively post order wise
- If an &-node is found, check if one of the children nodes is a |-node
- Set orChild and andChild accordingly
- Traverse the orChild-tree iterative pre order wise and for each OR-leaf push a new &-node with andChild and the OR-leaf value to the stack
- If you meet another &-node push a new &-node with andChild and the whole &-node you found to the stack
- After traversing is done, combine the nodes on the stack using an |-node
- The new sub tree, which has an |-node as root, replaces the &-node you started to traverse from
- As the outer traversal is post order, the newly created nodes are not traversed and have no effect on further changes
- Repeat the whole process until the resulting tree does not change anymore


来源:https://stackoverflow.com/questions/24746046/apply-distributive-law-on-ast-or-rpn-disjunctive-normal-form

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!