What is an efficient way to find the minimum sum of multiple dictionary values, given keys of mutually exclusive integers?

问题

I have a dictionary, the keys of which consist of all len(4) combinations of the integers 0 to n, with all values being floats (representing a cost that was computed by another function).

e.g.:

cost_dict = {(0,1,2,3): 6.23, (0,1,2,4): 7.89,
            ...
            (14,15,16,17): 2.57}

I would like to efficiently find m mutually exclusive keys (that is, where the keys do not share any of their integers) whose values sum to the lowest number (thus, finding the lowest overall cost). That is, I don't just want the m minimum values of the dictionary, I want m mutually exclusive values that sum to the lowest value. (Or failing the absolute minimum, I wouldn't mind something efficient that comes pretty close).

So in the above example, for m = 3, maybe:

cost_dict[(0,3,5,11)]
>1.1 
cost_dict[(2,6,7,13)]
>0.24
cost_dict[(4,10,14,15)]
>3.91

... could be the keys whose values sum to the lowest possible value, of all mutually exclusively keys in this dictionary.

It may be possible that the smallest three values in the dict were something like:

cost_dict[(0,3,7,13)]
>0.5
cost_dict[(2,6,7,13)]
>0.24
cost_dict[(4,6,14,15)]
>0.8

But given the integers in these keys are not mutually exclusive, this would not be correct.

Is it possible to do better than O(n**m) time? That is, I could sum every item against every other item whose key is disjoint with the first (this would need the keys to be frozensets instead of tuples) for m levels. This is rather slow given the dictionary's length can be up to 10,000.

Something that seems to have helped me with an earlier version of this problem is creating a list of all possible combinations of keys, which is time-intensive, but potentially more efficient given that I will need to be finding the minimum cost numerous times.

回答1:

I tried solving this problem three different ways- an optimized brute force, a dynamic programming approach, and a greedy algorithm. The first two could not handle inputs for n > 17, but generated optimal solutions, so I could use them to verify the average performance of the greedy method. I'll start first with the dynamic programming approach, and then describe the greedy one.

Dynamic Programming

First, note that we can if we determine that (1, 2, 3, 4) and (5, 6, 7, 8) sum to a smaller value than (3, 4, 5, 6) and (1, 2, 7, 8), then your optimal solution absolutely cannot contain both (3, 4, 5, 6) and (1, 2, 7, 8)- because you could swap them out for the former, and have a smaller sum. Extending this logic, there will be one optimal combination of (a, b, c, d) and (e, f, g, h) that results in the minimal sum from all combinations of x0, x1, x2, x3, x4, x5, x6, x7, and so we can rule out all of the others.

Using this knowledge, we can be a dictionary mapping of all x0, x1, x2, x3, x4, x5, x6, x7 combinations from the set [0, n), to their minimal sums, by brute forcing the sums of all combinations of x0, x1, x2, x3. Then, we can use these mappings to repeat the process for x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11 from x0, x1, x2, x3, x4, x5, x6, x7 and x0, x1, x2, x3 pairs. We repeat this process until we obtain all minimal sums for x0, x1 ... x_(4*m-1), which we then iterate over to find the minimal sum.

def dp_solve(const_dict, n, m):

    lookup = {comb:(comb,) for comb in const_dict.keys()}

    keys = set(range(n))
    for size in range(8, 4 * m + 1, 4):
        for key_total in combinations(keys, size):
            key_set = set(key_total)
            min_keys = (key_total[:4], key_total[4:])
            min_val = const_dict[min_keys[0]] + const_dict[min_keys[1]]

            key1, key2 = min(zip(combinations(key_total, 4), reversed(list(combinations(key_total, size - 4)))), key=lambda x:const_dict[x[0]]+const_dict[x[1]])

            k = tuple(sorted(x for x in key1 + key2))
            const_dict[k] = const_dict[key1] + const_dict[key2]
            lookup[k] = lookup[key1] + lookup[key2]

    key, val = min(((key, val) for key, val in const_dict.items() if len(key) == 4 * m), key=lambda x: x[1])
    return lookup[key], val

Admittedly this implementation is pretty gnarly, because I kept micro-optimizing piece after piece hoping to make it fast enough without having to switch to a greedy approach.

Greedy

This is probably the one you care about, since it handles fairly large inputs quickly, and is quite accurate.

Start by constructing a list for partial sums, and begin iterating over your elements in your dictionary by increasing value. For each element, find all partial sums that don't create any collisions with their keys and "combine" them into a new partial sum, and append to the list. In doing so, you build a list of minimal partial sums that can be created from the smallest k values in your dictionary. To speed this all up, I use hash sets to quickly check which partial sums contain pairs of the same key.

In the "fast" greedy approach, you would abort the moment you find a partial sum that has a key length of 4 * m (or equivalently, of m 4-tuples). This usually nets fairly good results in my experience, but I wanted to add some logic to make it more accurate if need be. To do so, I add two factors-

extra_runs - which dictates how many extra iterations to search for better solutions before breaking
check_factor - which specifies a multiple of current search "depth" to scan forward for a single new integer that creates a better solution with the current state. This is different from the above in that it does not "preserve" each new integer checked- it only does a quick sum to see if it creates a new min. This makes it significantly faster, at the cost that the other m - 1 4-tuples must already exist in one of the partial sums.

Combined, these checks seem to always find the true minimal sum, at the cost of about 5x longer runtime (still quite fast though). To disable them, just pass 0 for both factors.

def greedy_solve(const_dict, n, m, extra_runs=10, check_factor=2):
    pairs = sorted(const_dict.items(), key=lambda x: x[1])

    lookup = [set([]) for _ in range(n)]
    nset = set([])

    min_sums = []
    min_key, min_val = None, None
    for i, (pkey, pval) in enumerate(pairs):
        valid = set(nset)
        for x in pkey:
            valid -= lookup[x]
            lookup[x].add(len(min_sums))

        nset.add(len(min_sums))
        min_sums.append(((pkey,), pval))

        for x in pkey:
            lookup[x].update(range(len(min_sums), len(min_sums) + len(valid)))
        for idx in valid:
            comb, val = min_sums[idx]
            for key in comb:
                for x in key:
                    lookup[x].add(len(min_sums))
            nset.add(len(min_sums))
            min_sums.append((comb + (pkey,), val + pval))
            if len(comb) == m - 1 and (not min_key or min_val > val + pval):
                min_key, min_val = min_sums[-1]

        if min_key:
            if not extra_runs: break
            extra_runs -= 1

    for pkey, pval in pairs[:int(check_factor*i)]:
        valid = set(nset)
        for x in pkey:
            valid -= lookup[x]

        for idx in valid:
            comb, val = min_sums[idx]
            if len(comb) < m - 1:
                nset.remove(idx)
            elif min_val > val + pval:
                min_key, min_val = comb + (pkey,), val + pval
    return min_key, min_val

I tested this for n < 36 and m < 9, and it seemed to run fairly fast (couple of seconds to complete at worst). I'd imagine it should work for your case 12 <= n <= 24 pretty quickly.

来源：https://stackoverflow.com/questions/54509436/what-is-an-efficient-way-to-find-the-minimum-sum-of-multiple-dictionary-values

标签

python

performance

combinations