Algorithm to find subset within two sets of integers whose sums match

给你一囗甜甜゛ 提交于 2019-12-29 04:01:06

问题


I'm looking for an algorithm which can take two sets of integers (both positive and negative) and find subsets within each that have the same sum.

The problem is similar to the subset sum problem except that I'm looking for subsets on both sides.

Here's an example:

List A {4, 5, 9, 10, 1}

List B {21, 7, -4, 180}

So the only match here is: {10, 1, 4, 9} <=> {21, 7, -4}

Does anyone know if there are existing algorithms for this kinda problems?

So far, the only solution I have is a brute force approach which tries every combination but it performs in Exponential time and I've had to put a hard limit on the number of elements to consider to avoid it from taking too long.

The only other solution I can think of is to run a factorial on both lists and look for equalities there but that is still not very efficient and takes exponentially longer as the lists get bigger.


回答1:


What others have said is true:

  1. This problem is NP-complete. An easy reduction is to usual subset-sum. You can show this by noting that a subset of A sums to a subset of B (not both empty) only if a non-empty subset of A union (-B) sums to zero.

  2. This problem is only weakly NP-complete, in that it's polynomial in the size of the numbers involved, but is conjectured to be exponential in their logarithms. This means that the problem is easier than the moniker "NP-complete" might suggest.

  3. You should use dynamic programming.

So what am I contributing to this discussion? Well, code (in Perl):

@a = qw(4 5 9 10 1);
@b = qw(21 7 -4 180);
%a = sums( @a );
%b = sums( @b );
for $m ( keys %a ) {
    next unless exists $b{$m};
    next if $m == 0 and (@{$a{0}} == 0 or @{$b{0}} == 0);
    print "sum(@{$a{$m}}) = sum(@{$b{$m}})\n";
}

sub sums {
    my( @a ) = @_;
    my( $a, %a, %b );
    %a = ( 0 => [] );
    while( @a ) {
        %b = %a;
        $a = shift @a;
        for my $m ( keys %a ) {
            $b{$m+$a} = [@{$a{$m}},$a];
        }
    %a = %b;
    }
    return %a;
}

It prints

sum(4 5 9 10) = sum(21 7)
sum(4 9 10 1) = sum(21 7 -4)

so, notably, there is more than one solution that works in your original problem!

Edit: User itzy correctly pointed out that this solution was wrong, and worse, in multiple ways!! I'm very sorry about that and I've hopefully addressed these concerns in the new code above. Nonetheless, there is still one problem, namely that for any particular subset-sum, it only prints one of the possible solutions. Unlike the previous problems, which were straight-up errors, I would classify this as an intentional limitation. Best of luck and beware of bugs!




回答2:


Like the subset sum problem, this problem is weakly NP-complete, so it has a solution that runs in time polynomial(M), where M is the sum of all numbers appearing in the problem instance. You can achieve that with dynamic programming. For each set you can generate all possible sums by filling a 2-dimensional binary table, where "true" at (k,m) means that a subset sum m can be achieved by picking some elements from the first k elements of the set.

You fill it iteratively - you set (k,m) to "true" if (k-1,m) is set to "true" (obviously, if you can get m from k-1 elements, you can get it from k elements by not picking the k-th) or if (k-1,m-d) is set to "true" where d is the value of k-th element in the set (the case where you pick the k-th element).

Filling the table gets you all the possible sums in the last column (the one representing the whole set). Do this for both sets and find common sums. You can backtrack the actual subsets representing the solutions by reversing the process which you used to fill the tables.




回答3:


Thanks a lot for all the quick responses!

The dynamic programming solution is not really different to the exhaustive approch we have right now and I guess if we need the optimal solution we do need to consider every possible combination but the time it takes to generate this exhaustive list of sums are too long.. Did a quick test and the time it takes to generate all possible sums for x number of elements very quickly go over 1 min:

11 elements took - 0.015625 seconds
12 elements took - 0.015625 seconds
13 elements took - 0.046875 seconds
14 elements took - 0.109375 seconds
15 elements took - 0.171875 seconds
16 elements took - 0.359375 seconds
17 elements took - 0.765625 seconds
18 elements took - 1.609375 seconds
19 elements took - 3.40625 seconds
20 elements took - 7.15625 seconds
21 elements took - 14.96875 seconds
22 elements took - 31.40625 seconds
23 elements took - 65.875 seconds
24 elements took - 135.953125 seconds
25 elements took - 282.015625 seconds
26 elements took - 586.140625 seconds
27 elements took - 1250.421875 seconds
28 elements took - 2552.53125 seconds
29 elements took - 5264.34375 seconds

which for the business problem we're trying to solve is not really acceptable.. I'm gonna go back to the drawing board and see whether we do indeed need to know all the solutions or can we just do with one (the smallest/largest subset, e.g.) instead and hopefully that can help simply the problem and make my algorithm perform to expectaion.

Thanks all the same!




回答4:


subset sum is Np-complete and you can polynomially reduce your problem to it, so your problem is NP-complete too.



来源:https://stackoverflow.com/questions/443712/algorithm-to-find-subset-within-two-sets-of-integers-whose-sums-match

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!