Constraint programming suitable for extracting OneToMany relationships from records

Maybe someone can help me to solve a problem with Prolog or any constraint programming language. Imagine a table of projects (school projects where pupils do something with their mothers). Each project has one or more children participating. For each child we store its name and the name of its mother. But for each project there is only one cell that contains all mothers and one cell that contains all children. Both cells are not necessarily ordered in the same way.

Example:

+-----------+-----------+------------+
|           |           |            |
|   Project |   Parents |   Children |
|           |           |            |
+-----------+-----------+------------+
|           |           |            |
|   1       |   Jane;   |   Brian;   |
|           |   Claire  |   Stephen  |
|           |           |            |
+-----------+-----------+------------+
|           |           |            |
|   2       |   Claire; |   Emma;    |
|           |   Jane    |   William  |
|           |           |            |
+-----------+-----------+------------+
|           |           |            |
|   3       |   Jane;   |   William; |
|           |   Claire  |   James    |
|           |           |            |
+-----------+-----------+------------+
|           |           |            |
|   4       |   Jane;   |   Brian;   |
|           |   Sophia; |   James;   |
|           |   Claire  |   Isabella |
|           |           |            |
+-----------+-----------+------------+
|           |           |            |
|   4       |   Claire  |   Brian    |
|           |           |            |
+-----------+-----------+------------+
|           |           |            |
|   5       |   Jane    |   Emma     |
|           |           |            |
+-----------+-----------+------------+

I hope this example visualizes the problem. As I said both cells only contain the names separated by a delimiter, but are not necessarily ordered in a similar way. So for technical applications you would transform the data into this:

+-------------+-----------+----------+
|   Project   |   Name    |   Role   |
+-------------+-----------+----------+
|   1         |   Jane    |   Mother |
+-------------+-----------+----------+
|   1         |   Claire  |   Mother |
+-------------+-----------+----------+
|   1         |   Brian   |   Child  |
+-------------+-----------+----------+
|   1         |   Stephen |   Child  |
+-------------+-----------+----------+
|   2         |   Jane    |   Mother |
+-------------+-----------+----------+
|   2         |   Claire  |   Mother |
+-------------+-----------+----------+
|   2         |   Emma    |   Child  |
+-------------+-----------+----------+
|   2         |   William |   Child  |
+-------------+-----------+----------+
|             |           |          |
|                                    |
|              And so on             |

The number of parents and children is equal for each project. So for each deal we have n mothers and n children and each mother belongs to exactly one child. With these constraints it is possible to assign each mother to all of her children by logical inference starting with the projects that involve only one child (i.e. 4 and 5).

Results:

Jane has Emma, Stephen and James;

Claire has Brian and William;

Sophia has Isabella

I am wondering how this can be solved using constraint programming. Additionally, the data set might be underdetermined and I am wondering if it is possible to isolate records that, when solved manually (i.e. when the mother-child assignments are done manually), would break the underdetermination.

I'm not sure if I understand all the requirements of the problem, but here is a constraint programming model in MiniZinc (http://www.minizinc.org/). The full model is here: http://hakank.org/minizinc/one_to_many.mzn .

LATER NOTE: The first version of the project constraints where not correct. I have removed the incorrect code . See the edit history for the original answer.

enum mothers = {jane,claire,sophia};
enum children = {brian,stephen,emma,william,james,isabella};      

% decision variables

% who is the mother of this child?
array[children] of var mothers: x;


solve satisfy;

constraint
  % All mothers has at least one child
  forall(m in mothers) (
    exists(c in children) (
      x[c] = m
    )
  )
;

constraint
% NOTE: This is a more correct version of the project constraints.
% project 1
(
  ( x[brian] = jane /\ x[stephen] = claire) \/
  ( x[stephen] = jane /\ x[brian] = claire)
) 
/\
% project 2
(
  ( x[emma] = claire /\ x[william] = jane) \/
  ( x[william] = claire /\ x[emma] = jane) 
)
/\
% project 3
(
  ( x[william] = claire /\ x[james] = jane) \/
  ( x[james] = claire /\ x[william] = jane) 
)
/\
% project 4
( 
  ( x[brian] = jane /\ x[james] = sophia /\ x[isabella] = claire) \/
  ( x[james] = jane /\ x[brian] = sophia /\ x[isabella] = claire) \/
  ( x[james] = jane /\ x[isabella] = sophia /\ x[brian] = claire) \/
  ( x[brian] = jane /\ x[isabella] = sophia /\ x[james] = claire) \/
  ( x[isabella] = jane /\ x[brian] = sophia /\ x[james] = claire) \/
  ( x[isabella] = jane /\ x[james] = sophia /\ x[brian] = claire) 
)
/\

% project 4(sic!)
( x[brian] = claire) /\

% project 5
( x[emma] = jane)
;


output [
  "\(c): \(x[c])\n"
  | c in children
];

The unique solution is

brian: claire
stephen: jane
emma: jane
william: claire
james: jane
isabella: sophia

Edit2: Here is a more general solution. See http://hakank.org/minizinc/one_to_many.mzn for the complete model.

include "globals.mzn"; 

enum mothers = {jane,claire,sophia};
enum children = {brian,stephen,emma,william,james,isabella};      

% decision variables
% who is the mother of this child?
array[children] of var mothers: x;

% combine all the combinations of mothers and children in a project
predicate check(array[int] of mothers: mm, array[int] of children: cc) =
  let {
    int: n = length(mm);
    array[1..n] of var 1..n: y;
  } in
  all_different(y) /\
  forall(i in 1..n) (
     x[cc[i]] = mm[y[i]]
  )
;    

solve satisfy;

constraint
% All mothers has at least one child.
forall(m in mothers) (
  exists(c in children) (
    x[c] = m
  )
)
;


constraint
% project 1    
check([jane,claire], [brian,stephen]) /\
% project 2
check([claire,jane],[emma,william]) /\
% project 3
check([claire,jane],[william,james]) /\
% project 4
check([claire,sophia,jane],[brian,james,isabella]) /\
% project 4(sic!)
check([claire],[brian]) /\
% project 5
check([jane],[emma])
;

output [
 "\(c): \(x[c])\n"
 | c in children
];

This model use the following predicate to ensure that all the combinations of mothers vs children are considered:

predicate check(array[int] of mothers: mm, array[int] of children: cc) =
   let {
     int: n = length(mm);
     array[1..n] of var 1..n: y;
  } in
  all_different(y) /\
  forall(i in 1..n) (
    x[cc[i]] = mm[y[i]]
  )
;

It use the global constraint all_different(y) to ensure that mm[y[i]] is one of the mothers in mm, and then assign the `i'th child to that specific mother.

A bit off topic, but since from SWI-Prolog manual:

Plain Prolog can be regarded as CLP(H), where H stands for Herbrand terms. Over this domain, =/2 and dif/2 are the most important constraints that express, respectively, equality and disequality of terms.

I feel authorized to suggest a Prolog solution, more general than the algorithm you suggested (progressively reduce relations based on single to single relations):

solve2(Projects,ParentsChildren) :-
    foldl([_-Ps-Cs,L,L1]>>try_links(Ps,Cs,L,L1),Projects,[],ChildrenParent),
    transpose_pairs(ChildrenParent,ParentsChildrenFlat),
    group_pairs_by_key(ParentsChildrenFlat,ParentsChildren).

try_links([],[],Linked,Linked).
try_links(Ps,Cs,Linked,Linked2) :-
    select(P,Ps,Ps1),
    select(C,Cs,Cs1),
    link(C,P,Linked,Linked1),
    try_links(Ps1,Cs1,Linked1,Linked2).

link(C,P,Assigned,Assigned1) :-
    (   memberchk(C-Q,Assigned)
    ->  P==Q,
        Assigned1=Assigned
    ;   Assigned1=[C-P|Assigned]
    ).

This accepts data in a natural format, like

data(1,
    [1-[jane,claire]-[brian,stephen]
    ,2-[claire,jane]-[emma,william]
    ,3-[jane,claire]-[william,james]
    ,4-[jane,sophia,claire]-[brian,james,isabella]
    ,5-[claire]-[brian]
    ,6-[jane]-[emma]
    ]).
data(2,
    [1-[jane,claire]-[brian,stephen]
    ,2-[claire,jane]-[emma,william]
    ,3-[jane,claire]-[william,james]
    ,4-[jane,sophia,claire]-[brian,james,isabella]
    ,5-[claire]-[brian]
    ,6-[jane]-[emma]
    ,7-[sally,sandy]-[grace,miriam]
    ]).

?- data(2,Ps),solve2(Ps,S).
Ps = [1-[jane, claire]-[brian, stephen], 2-[claire, jane]-[emma, william], 3-[jane, claire]-[william, james], 4-[jane, sophia, claire]-[brian, james, isabella], 5-[claire]-[brian], 6-[jane]-[emma], 7-[...|...]-[grace|...]],
S = [claire-[william, brian], jane-[james, emma, stephen], sally-[grace], sandy-[miriam], sophia-[isabella]].

This is my first CHR program, so I hope that someone will come and give me some advice on how to improve it.

My thinking is that you need to expand all the lists into facts. From there, if you know that a project has just one parent and one child, you can establish the parent relationship from that. Also, once you have a parent-child relationship, you can remove that set from the other facts in the other projects and reduce the cardinality of the problem by one. Eventually you will have figured out everything you can. The only difference between a completely determined dataset and an incompletely determined one is in how far that reduction can go. If it doesn't quite get there, it will leave around some facts so you can see which projects/parents/children are still creating ambiguity.

:- use_module(library(chr)).

:- chr_constraint project/3, project_parent/2, project_child/2, 
   project_parents/2, project_children/2, project_size/2, parent/2.

%% turn a project into a fact about its size plus 
%% facts for each parent and child in this project
project(N, Parents, Children) <=>
    length(Parents, Len),
    project_size(N, Len),
    project_parents(N, Parents),
    project_children(N, Children).

%% expand the list of parents for this project into a fact per parent per project
project_parents(_, []) <=> true.
project_parents(N, [Parent|Parents]) <=>
    project_parent(N, Parent),
    project_parents(N, Parents).

%% same for the children
project_children(_, []) <=> true.
project_children(N, [Child|Children]) <=>
    project_child(N, Child),
    project_children(N, Children).

%% a single parent-child combo on a project is exactly what we need
one_parent @ project_size(Project, 1), 
             project_parent(Project, Parent), 
             project_child(Project, Child) <=>
    parent(Parent, Child).

%% if I have a parent relationship for project of size N,
%% remove this parent and child from the project and decrease
%% the number of parents and children by one
parent_det @ parent(Parent, Child) \ project_size(Project, N), 
                                     project_parent(Project, Parent), 
                                     project_child(Project, Child) <=>
    succ(N0, N),
    project_size(Project, N0).

I ran this with your example by making a main/0 predicate to do it:

main :-
    project(1, [jane, claire], [brian, stephen]),
    project(2, [claire, jane], [emma, william]),
    project(3, [jane, claire], [william, james]),
    project(4, [jane, sophia, claire], [brian, james, isabella]),
    project(5, [claire], [brian]),
    project(6, [jane], [emma]).

This outputs:

parent(sophia, isabella),
parent(jane, james),
parent(claire, william),
parent(jane, emma),
parent(jane, stephen),
parent(claire, brian).

To demonstrate incomplete determination, I added a seventh project:

project(7, [sally,sandy], [grace,miriam]).

The program then outputs this:

project_parent(7, sandy),
project_parent(7, sally),
project_child(7, miriam),
project_child(7, grace),
project_size(7, 2),
parent(sophia, isabella),
parent(jane, james),
parent(claire, william),
parent(jane, emma),
parent(jane, stephen),
parent(claire, brian).

As you can see, any project_size/2 that remains tells you the cardinality of what remains to be solved (project seven has two parent/children relationships still remaining to be determined) and you get back exactly the parents/children that remain to be handled, as well as all of the parent/2 relations which could be determined.

I'm pretty happy with this outcome but hopefully others can come and improve my code!

Edit: my code has a shortcoming which was identified on the mailing list, that certain inputs will fail to converge even though the solution can be computed, for instance:

project(1,[jane,claire],[brian, stephan]),
project(2,[jane,emma],[stephan, jones]).

For more information, see Ian's solution, which uses set intersection to determine the mapping.

来源：https://stackoverflow.com/questions/56843065/constraint-programming-suitable-for-extracting-onetomany-relationships-from-reco

标签

prolog

constraints

constraint-programming