Algorithm for finding smallest collection of components

让人想犯罪 __ 提交于 2019-12-21 04:21:33

问题


I'm looking for an algorithm to solve the following problem. I have a number of subsets (1-n) of a given set (a-h). I want to find the smallest collection of subsets that will allow me to construct, by combination, all of the given subsets. This collection can contain subsets that do not exist in 1-n yet.

  a b c d e f g h
1 1
2 1   1
3   1     1   1
4 1       1
5   1         1
6 1     1   1   1
7 1       1 1   1
8 1   1       1
9 1         1   1

Below are two possible collections, the smallest of which contains seven subsets. I have denoted new subsets with an x.

1 1
x   1
x     1
x       1
x         1
x           1
x             1
x               1

1 1
x   1         
x     1
x       1        
x         1    
x           1   1
x             1

I believe this must be a known problem, but I'm not very familiar with algorithms. Any help is very much appreciated, as is a suggestion for a better topic title.

Thanks!

Update

Graph coloring gets me a long way, thanks. However, in my case subsets are allowed to overlap. For example:

  a b c d
1 1 1 1  
2 1 1 1 
3 1 1 1
4     1 1
5 1 1 1 1

Graph coloring gives me this solution:

x 1 1
x     1
x       1     

But this one is valid as well, and is smaller:

1 1 1 1  
4     1 1

回答1:


This problem is known as Set Basis, and it is NP-complete (Larry J. Stockmeyer: The set basis problem is NP-complete. Technical Report RC-5431, IBM, 1975). Its formulation as a graph problem is Bipartite Dimension. Since it is very hard to solve in general, it might be useful to look if there are any helpful properties of your data (e.g., are the sets small? Is the solution small? Can all sets occur?)

I cannot think of an easy ILP formulation. Instead, you could try to reduce the problem to Clique Cover, which is better studied, using either the reduction from Kou&Wong or the one from Nor et al.. I have coauthered a paper discussing algorithms for Clique Cover, and written a Clique cover solver with both an exact solver and two heuristics.




回答2:


This problem was shown in one the video's of Coursera's Discrete Optimization lectures. IIRC, it's called the set cover problem.

IIRC, it's NP-complete or NP-hard, so look into the typical algorithms (exact algo's for small datasets, metaheuristics for medium/big datasets) and typical frameworks (OptaPlanner, ...)




回答3:


For this variant of the Set Cover problem, here is an Integer Programming formulation approach, with row generation.

Let's denote the components a,b,c,d... by their Column number. a=1, b=2 etc.

The rows are 'subsets.' Let's say that the EXISTING subsets are S1,...Sm. (These are the ones that HAVE to be covered.)

Notation for NEW subsets

This is the step where we introduce NEW subsets. Let's call the 'atomic' subsets as a_x. All a subsets have only one component.

   a1 is the subset {1,0,0,0}
   a2 is the subset {0,1,0,0}
   a3 is the subset {1,0,1,0}
   ...

Let bxy be subsets with two components.

So `b13` is the subset with component 1 and 3 being present.
b13 = {1, 0, 1, 0}
b34 = {0, 0, 1, 1} etc.

cxyz are subsets with three components.
For example, c124 = { 1, 1, 0, 1} etc.

d subsets will have 4 components
e subsets will have 5 components 
and so on.

Row Generation Step

Given an EXISTING Set, we generate only the needed NEW a, b, c ... subsets as we need.

For example, let's take the subset S1 = {1, 0, 1, 1}
Meaningful sets needed that can help create S1 are
a1, a3, a4. (Note that a2 is not needed since component b is not a component in S1}
b11, b13, b34.
c134

PREPROCESSING STEP: For each Sj in EXISTING SETS, generate new sub sets, using the procedure mentioned above. We create only as many ax, bxy, cxyz dxyzw... as needed. This step is needed before the formulation step.

In the worst case, there are (2^num_components-1) subsets needed per Sj. But they are easy to generate.

Example Problem

Now the formulation for the following problem:

  a b c d
1 1 1 1  
2 1 1 1 
3 1 1 1
4     1 1
5 1 1 1 1

We will have one constraint for each ROW. Each set has to be "covered"

For the problem above, here's the formulation

Formulation

Objective Minimize sum of all Subsets.
 Min sum (a_x) + sum (b_xy) + sum (c_xyz) + sum (d_xyzw)

Subject to:

   a1 + a2 + a3 + b11 + b12 + b13 + c123  >= 1 \\ Set 1 has to be formed
   a1 + a2 + a3 + b11 + b12 + b13 + c123  >= 1 \\ Set 2 has to be formed
   a1 + a2 + a3 + b11 + b12 + b13 + c123  >= 1 \\ Set 3 has to be formed
   a4 + a5            + b34               >= 1 \\ Set 4 has to be formed
   a1 + a2 + a3 + a4 + b11 + b12 + ..+  b34 + c123 + ...+ d1234  >= 1 \\ Set 5 has to be formed

 a's, b's, c's, d's Binary

Upper bound: By exploiting the fact that you need at most j subsets (Number of existing Subsets) you can even add a cut. Objective function has to be j or lower.

Hope that helps.



来源:https://stackoverflow.com/questions/20981490/algorithm-for-finding-smallest-collection-of-components

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!