Computing unique intersections of subsets

问题

Given a set S = {s_i : {z_j : z ∈ N} }, what is a time-efficient algorithm for computing the unique sets of intersections of the subsets of S?

For background, I am dealing with several versions of this problem, some larger than others. In the smallest one |S| ≈ 1,000, |s_i| ≈ 10,000 and the values are zip codes.

Tiny example for clarity:

Input: S = {{},{1},{2,3},{3,4,5,6,7,8,9,10}}
Output: {{},{1},{2,3},{3,4,5,6,7,8,9,10},{3}}

|S| = 4 and there are 2⁴ = 16 subsets of S. However, there are only five unique sets of subset intersections. The first four are the members of S themselves. The fifth is {3}. The empty set is already a member of S. All other 10 subset intersections produce empty sets also.

回答1:

Here's a fast preprocessing step that might be worthwhile.

Call elements x and y equivalent if, for every set s_i, either both or neither of x and y belong to s_i. Simplify the problem by deleting all elements except one representative of each equivalence class. At the end, expand each representative to its class.

To simplify, scan sets one by one while maintaining a map from each element to a label for its equivalence class, where equivalence is determined with respect to the sets processed so far. Initially, all elements are in the same class, so this map sends each element to the same label. To process a set, initialize an empty map of new labels. For each element x in the set, let k be x's current label. If the key k exists in the new label map, then the value corresponding to k becomes x's new label. Otherwise, we allocate a new label and assign it to x and add a mapping from k to the new label.

Here's the execution on your input.

Initially label = {1: a, 2: a, 3: a, 4: a, 5: a, 6: a, 7: a, 8: a, 9: a, 10: a}.
Scan {}. Nothing happens.
Scan {1}. Change label[1] to b.
Scan {2, 3}. Change label[2] and label[3] to c.
Scan {3, 4, 5, 6, 7, 8, 9, 10}. Change label[3] to d and label[4..10] to e.
At the end, label = {1: b, 2: c, 3: d, 4: e, 5: e, 6: e, 7: e, 8: e, 9: e, 10: e}. Select 1 (b) and 2 (c) and 3 (d) and 4 (e) to represent their classes. All others are deleted.

回答2:

Asymptotic Time Complexity:

n: number of sets, which will change during execution

m: average set size

Time: T(Search-Matching-Sets) x T(Intersection) x SUM { k } for k: 1..n

Time: 2m x 2m x 1/2 n(n+1)

Time: O(4 m^2 x 1/2 n x (n+1)) ~ O(n^2 m^2)

Proposed Algorithm:

FOR i: 1..n
{

    FOR j: i..n
    {

        IF i = j THEN SKIP

        INTERSECTION := FIND-INTERSECTION ( SETS[i], SETS[j] )

        IF INTERSECTION ⊄ SETS[] THEN ADD INTERSECTION TO SETS[]

    }

}

来源：https://stackoverflow.com/questions/15803808/computing-unique-intersections-of-subsets

标签

performance

algorithm

set

intersection

discrete-mathematics