How to enumerate combinations using DCGs with CLP(FD) and multiple constraints

醉酒当歌 提交于 2019-11-28 01:33:42
lurker

Basic tree expression parser with counters

Assuming a compound term representation for binary-unary trees (e.g., b(t,u(b(t,t,)))), here is a basic parser. CLP(FD) is generally recommended for reasoning over integers.

expression(U, B, E) :-
    terminal(U, B, E).
expression(U, B, E) :-
    unary(U, B, E).
expression(U, B, E) :-
    binary(U, B, E).

terminal(0, 0, t).

unary(U, B, u(E)) :-
    U1 #>= 0,
    U #= U1 + 1,
    expression(U1, B, E).

binary(U, B, b(E1,E2)) :-
    U1 #>= 0, U2 #>= 0,
    U #= U1 + U2,
    B1 #>= 0, B2 #>= 0,
    B #= B1 + B2 + 1,
    expression(U1, B1, E1),
    expression(U2, B2, E2).

There are a couple of things I've done intentionally here. One is to use CLP(FD) to give me more relational reasoning over the counts for unary and binary terms. The other thing I've done is put the simpler expression/3 clause first which doesn't do recursion. That way, Prolog will hit terminals first in the process of exploring possible solutions.

Example executions:

| ?- expression(1,2,E).

E = u(b(t,b(t,t))) ? a

E = u(b(b(t,t),t))

E = b(t,u(b(t,t)))

E = b(t,b(t,u(t)))

E = b(t,b(u(t),t))

E = b(u(t),b(t,t))

E = b(u(b(t,t)),t)

E = b(b(t,t),u(t))

E = b(b(t,u(t)),t)

E = b(b(u(t),t),t)

(1 ms) no


| ?- expression(U, B, E).

B = 0
E = t
U = 0 ? ;

B = 0
E = u(t)
U = 1 ? ;

B = 0
E = u(u(t))
U = 2 ? ;
...

Using a DCG for sequential representation

A DCG is used for parsing sequences. The compound term can be parsed as a sequence of tokens or characters, which can, through the use of a DCG, be mapped to the compound term itself. We might, for example, represent the compound tree term b(t,u(b(t,t))) as [b, '(', t, u, '(', b, '(', t, t, ')', ')', ')']. Then we can use a DCG and include that representation. Here's a DCG that reflects the above implementation with this sequence format:

expression(U, B, E) -->
    terminal(U, B, E) |
    unary(U, B, E) |
    binary(U, B, E).

terminal(0, 0, t) --> [t].

unary(U, B, u(E)) -->
    [u, '('],
    { U1 #>= 0, U #= U1 + 1 },
    expression(U1, B, E),
    [')'].

binary(U, B, b(E1, E2)) -->
    [b, '('],
    { U1 #>= 0, U2 #>= 0, U #= U1 + U2, B1 #>= 0, B2 #>= 0, B #= B1 + B2 + 1 },
    expression(U1, B1, E1),
    expression(U2, B2, E2),
    [')'].

Again, I put the terminal//3 as the first course of query for expression//3. You can see the parallelism between this and the non-DCG version. Here are example executions.

| ?-  phrase(expression(1,2,E), S).

E = u(b(t,b(t,t)))
S = [u,'(',b,'(',t,b,'(',t,t,')',')',')'] ? a

E = u(b(b(t,t),t))
S = [u,'(',b,'(',b,'(',t,t,')',t,')',')']

E = b(t,u(b(t,t)))
S = [b,'(',t,u,'(',b,'(',t,t,')',')',')']

E = b(t,b(t,u(t)))
S = [b,'(',t,b,'(',t,u,'(',t,')',')',')']

E = b(t,b(u(t),t))
S = [b,'(',t,b,'(',u,'(',t,')',t,')',')']

E = b(u(t),b(t,t))
S = [b,'(',u,'(',t,')',b,'(',t,t,')',')']

E = b(u(b(t,t)),t)
S = [b,'(',u,'(',b,'(',t,t,')',')',t,')']

E = b(b(t,t),u(t))
S = [b,'(',b,'(',t,t,')',u,'(',t,')',')']

E = b(b(t,u(t)),t)
S = [b,'(',b,'(',t,u,'(',t,')',')',t,')']

E = b(b(u(t),t),t)
S = [b,'(',b,'(',u,'(',t,')',t,')',t,')']

no

| ?-  phrase(expression(U,B,E), S).

B = 0
E = t
S = [t]
U = 0 ? ;

B = 0
E = u(t)
S = [u,'(',t,')']
U = 1 ? ;

B = 0
E = u(u(t))
S = [u,'(',u,'(',t,')',')']
U = 2 ?
...

Hopefully this answers question #1, and perhaps #4 by example. The general problem of converting any set of predicates to a DCG, though, is more difficult. As I mentioned above, DCG is really for handling sequences.

Using length/2 to control solution order

In answer to #2, now that we have a DCG solution that will generate solutions properly, we can control the order of solutions given by using length/2, which will provide solutions in order of length rather than depth-first. You can constrain the length right from the beginning, which is more effective and efficient than constraining the length at each step in the recursion, which is redundant:

?- length(S, _), phrase(expression(U,B,E), S).

B = 0
E = t
S = [t]
U = 0 ? ;

B = 0
E = u(t)
S = [u,'(',t,')']
U = 1 ? ;

B = 1
E = b(t,t)
S = [b,'(',t,t,')']
U = 0 ? ;

B = 0
E = u(u(t))
S = [u,'(',u,'(',t,')',')']
U = 2 ? ;

B = 1
E = u(b(t,t))
S = [u,'(',b,'(',t,t,')',')']
U = 1 ? ;

B = 1
E = b(t,u(t))
S = [b,'(',t,u,'(',t,')',')']
U = 1 ? ;

B = 1
E = b(u(t),t)
S = [b,'(',u,'(',t,')',t,')']
U = 1 ? 
...

If I were using the sequential representation of the unary-binary tree for constraining solutions, not for parsing, I would get rid of the parentheses since they aren't necessary in the representation:

unary(U, B, u(E)) -->
    [u],
    { U1 #>= 0, U #= U1 + 1 },
    expression(U1, B, E).

binary(U, B, b(E1, E2)) -->
    [b],
    { U1 #>= 0, U2 #>= 0, U #= U1 + U2, B1 #>= 0, B2 #>= 0, B #= B1 + B2 + 1 },
    expression(U1, B1, E1),
    expression(U2, B2, E2).

It's probably a little more efficient since there are a fewer number of list lengths that correspond to invalid sequences. This results in:

| ?- length(S, _), phrase(expression(U, B, E), S).

B = 0
E = t
S = [t]
U = 0 ? ;

B = 0
E = u(t)
S = [u,t]
U = 1 ? ;

B = 0
E = u(u(t))
S = [u,u,t]
U = 2 ? ;

B = 1
E = b(t,t)
S = [b,t,t]
U = 0 ? ;

B = 0
E = u(u(u(t)))
S = [u,u,u,t]
U = 3 ? ;

B = 1
E = u(b(t,t))
S = [u,b,t,t]
U = 1 ? ;

B = 1
E = b(t,u(t))
S = [b,t,u,t]
U = 1 ? ;

B = 1
E = b(u(t),t)
S = [b,u,t,t]
U = 1 ? ;

B = 0
E = u(u(u(u(t))))
S = [u,u,u,u,t]
U = 4 ? ;

B = 1
E = u(u(b(t,t)))
S = [u,u,b,t,t]
U = 2 ? ;
...

So, if you have a recursive definition of a general term, Term, which can be expressed as a sequence (thus, using a DCG), then length/2 can be used in this way to constrain the solutions and order them by length of sequence, which corresponds to some ordering of the original terms. Indeed, the introduction of the length/2 may prevent your DCG from infinitely recursing without presenting any solutions, but I would still prefer to have the DCG be better behaved to start with by attempting to organize the logic to walk the terminals first.

@lurker has already given an excellent answer, and I would like to make a few complementary observations.

First, it would help tremendously if you would post new questions if the need arises to discuss a particular topic in more detail. I can see that the issues you have now raised are all thematically related, and I would now like to give an overall description that I hope addresses the core aspects. However, each of these topics can be discussed in much more detail, and filing new questions would be very worthwhile to allow more room for more elaborate descriptions.

I start with the version that I shall call your initial version:

e_b(t, B, B, U, U).
e_b(u(E), B0, B1, U0, U2) :-
        U1 #= U0 + 1,
        e_b(E, B0, B1, U1, U2).
e_b(b(E0, E1), B0, B3, U0, U2) :-
        B1 #= B0 + 1,
        e_b(E0, B1, B2, U0, U1),
        e_b(E1, B2, B3, U1, U2).

e_b(U, B, Es) :-
        U #=< Us,
        B #=< Bs,
        e_b(Es, 0, Bs, 0, Us).

This settles the first question:

Can CLP(FD) be used with this solution and if so how?

Yes, CLP(FD) can obviously be used: We are already doing so. Note that I conciously call this version the "initial" one, because I simply ignore all attempts that use (is)/2 or (=:=)/2. Simply use (#=)/2 when reasoning over integers, and benefit from its increased generality over low-level arithmetic.

The major problem with this version is that queries that ought to terminate don't terminate:

?- e_b(1, 2, Es), false.
nontermination

Why is this the case? To find a reason, I use failure slices, reducing the whole program to fragments that I can more easily understand. For this, I simply insert calls of false/0 at some points in the program.

You can try arbitrary points. For example, let us keep e_b/3 unchanged, and change e_b/5 to:

e_b(t, B, B, U, U).
e_b(u(E), B0, B1, U0, U2) :-
        U1 #= U0 + 1,
        false,
        e_b(E, B0, B1, U1, U2).
e_b(b(E0, E1), B0, B3, U0, U2) :-
        B1 #= B0 + 1,
        e_b(E0, B1, B2, U0, U1),
        false,
        e_b(E1, B2, B3, U1, U2).

I am using strikeout text to mark goals that cannot cause nontermination. Even with this modified version, we get:

?- e_b(1, 2, Es), false.
nontermination

This means the following simplified version of the program still exhibits the nontermination!

e_b(t, B, B, U, U).
e_b(u(E), B0, B1, U0, U2) :-
        U1 #= U0 + 1.
e_b(b(E0, E1), B0, B3, U0, U2) :-
        B1 #= B0 + 1,
        e_b(E0, B1, B2, U0, U1).

I'm doing this all only to break down the problem into more managable parts. That this possibility exists at all is a major attraction of logic programming. Good luck applying such a technique to other programming languages, or even finding such an approach!

Now, the simplified version does report answers:

?- e_b(1, 2, Es).
Es = u(_1064) ;
Es = b(t, _1066) ;
Es = b(u(_1070), _1066) ;
Es = b(b(t, _1072), _1066) .

But, as I said, our query also does not terminate universally even with this simplified program:

?- e_b(1, 2, Es), false.
nontermination

To correct the problem in the initial version, we must correct it also in this fragment. There's no way around it! Put differently, as long as this termination problem exists in the simplified version, the initial version will not terminate either.

Let us therefore focus on the simplified version, and first adjust the variables so that no more singleton variables appear. These issues have arisen because we have removed some of the goals, and we are now simply linking the pairs properly again:

e_b(t, B, B, U, U).
e_b(u(_), B, B, U0, U1) :-
        U1 #= U0 + 1.
e_b(b(E0, _), B0, B, U0, U) :-
        B1 #= B0 + 1,
        e_b(E0, B1, B, U0, U).

Here's the query again:

?- e_b(1, 2, Es), false.
nontermination

In fact, the following even simpler version still exhibits the nontermination:

e_b(t, B, B, U, U).
e_b(u(_), B, B, U0, U1) :-
        U1 #= U0 + 1.
e_b(b(E0, _), B0, B, U0, U) :-
        B1 #= B0 + 1,
        e_b(E0, B1, B, U0, U).

Here, I have simply removed an entire clause, making this equivalent to:

e_b(t, B, B, U, U).
e_b(b(E0, _), B0, B, U0, U) :-
        B1 #= B0 + 1,
        e_b(E0, B1, B, U0, U).

So, why does this simplified version not terminate?

For reference, here is the entire program we are talking about at this point:

e_b(t, B, B, U, U).
e_b(b(E0, _), B0, B, U0, U) :-
        B1 #= B0 + 1,
        e_b(E0, B1, B, U0, U).

e_b(U, B, Es) :-
        U #=< Us,
        B #=< Bs,
        e_b(Es, 0, Bs, 0, Us).

With the problematic query still being:

?- e_b(1, 2, Es).
nontermination

Now, in this case, we are again getting no solution, even though we expect one. How can we go about debugging this? Let us do the most obvious thing (if you think about it): Let us ask Prolog what solutions there are at all. We do this by posing the most general query, where all arguments are fresh variables:

?- e_b(A, B, Es).
Es = t,
A in inf..0,
B in inf..0 ;
Es = b(t, _3532),
A in inf..0,
B in inf..1 ;
Es = b(b(t, _3550), _3544),
A in inf..0,
B in inf..2 .

Now there are already first signs of problems in these answers. For example, who has ever heard of trees with negative depth?

Let us enforce more reasonable requirements by posting:

?- A #>= 0, B #>= 0, e_b(A, B, Es).
A = B, B = 0,
Es = t ;
A = 0,
Es = b(t, _8094),
B in 0..1 ;
A = 0,
Es = b(b(t, _8112), _8106),
B in 0..2 .

This looks a lot better already, but it does not fix the core issue. To obtain terminating behaviour for more specific queries, you need to find a way to keep the search bounded. I leave this as an exercise for now, and encourage you to file a new question if you want more information about this. Focus on this simple fragment for now!

Now to the second question:

When is the use of length/2 required to constrain the size of DCG results and when can CLP(FD) be used?

length/2 can be used to generate lists of increasing length. DCGs always describe lists. In Prolog, it is natural to use the length of a list as a measure for something. For example, we can use the length of a list as a measure for the depth of terms you are trying to find. This is suitable because Prolog provides nice syntax for lists, and it can be more convenient to reason symbolically than to reason numerically.

When reasoning over integers, use CLP(FD) constraints. Thus, if you decide to use an integer as the measure of something, use CLP(FD) constraints.

This brings us to the third question:

What other means are available to cause iterative deepening with DCG?

length/2 to describe lists of increasing length is by far the most natural way, if the DCG itself takes this measure into account in the list it describes. However, you can also use other ways, if you use a dedicated argument or argument pair to pass the state of the measure around.

The last two questions are related:

How would I convert the non DCG solution back into a DCG version? As my DCG get more complex I will be needing more constraint variables. Is there a standard practice on how to handle this, or just follow the rinse and repeat methodology?

Every time you see a pattern of the form V0 &rightarrow; V1 &rightarrow; V2 &rightarrow;…&rightarrow; V that is, a variable that is simply passed along in a clause body, you can use DCG semicontext notation to pass it around implicitly. Your code exhibits this pattern, and so DCGs are applicable.

For one variable, use a list with a single element that contains just that variable. If you want to pass around more than one variable, use a list that contains a single term of the form f(...), capturing all variables you want to pass around. This is also well worth its own question.


I have one final note on the choice of representation. Please try out the following, using for example GNU Prolog or any other conforming Prolog system:

| ?- write_canonical([op(add),[Left,Right]]). 
'.'(op(add),'.'('.'(_18,'.'(_19,[])),[]))

This shows that this is a rather wasteful representation, and at the same time prevents uniform treatment of all expressions you generate, combining several disadvantages.

You can make this more compact for example using Left+Right, or make all terms uniformly available using for example op_arguments(add, [Left,Right]), op_arguments(number, [1]) etc.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!