I have a collection of unique sets (represented as bit masks) and would like to eliminate all elements that are proper subsets of another element. For example:
Pre-process assumptions:
Same assumptions. Can uniqueness be assumed? (i.e. there is not {1,4,6},{1,4,6}) Otherwise, you would need to check for distinct at some point, probably once the buckets are created.
semi psuedo
List Sets;//input
List Output;
List> Buckets;
int length = Sets[0].length;//"by descending lengths"
List Bucket = new List();//current bucket
//Place each set with shared length in its own bucket
for( Set set in Sets )
{
if( set.length == length )//current Bucket
{
Bucket.add(set);
}else//new Bucket
{
length = set.length;
Buckets.Add(Bucket);
Bucket = new Bucket();
Bucket.Add(set);
}
}
Buckets.add(Bucket);
//Based on the assumption of uniqueness, everything in the first bucket is
//larger than every other set and since it is unique, they are not proper subsets
Output.AddRange(Buckets[0]);
//Iterate through the buckets
for( int i = 1; i < Buckets.length; i++ )
{
List currentBucket = Buckets[i];
//Iterate through the sets in the current bucket
for( int a = 0; a < currentBucket.length; a++ )
{
Set currentSet = currentBucket[a];
bool addSet = true;
//Iterate through buckets with greater length
for( int b = 0; b < i; b++ )
{
List testBucket = Buckets[b];
//Iterate through the sets in testBucket
for( int c = 0; c < testBucket.length; c++ )
{
Set testSet = testBucket[c];
int testMatches = 0;
//Iterate through the values in the current set
for( int d = 0; d < currentSet.length; d++ )
{
int testIndex = 0;
//Iterate through the values in the test set
for( ; testIndex < testSet.length; testIndex++ )
{
if( currentSet[d] < testSet[testIndex] )
{
setClear = true;
break;
}
if( currentSet[d] == testSet[testIndex] )
{
testMatches++;
if( testMatches == currentSet.length )
{
addSet = false;
setClear = true;
break;
}
}
}//testIndex
if( setClear ) break;
}//d
if( !addSet ) break;
}//c
if( !addSet ) break;
}//b
if( addSet ) Output.Add( currentSet );
}//a
}//i
O( n(n+1)/2 )) ... not efficient enoughsemi psuedo
//input Sets
List results;
for( int current = 0; current < Sets.length; current++ )
{
bool addCurrent = true;
Set currentSet = Sets[current];
for( int other = 0; other < current; other++)
{
Set otherSet = Sets[other];
//is current a subset of other?
if( currentSet.total > otherSet.total
|| currentSet.length >= otherSet.length) continue;
int max = currentSet.length;
int matches = 0;
int otherIndex = 0, len = otherSet.length;
for( int i = 0; i < max; i++ )
{
for( ; otherIndex < len; otherIndex++ )
{
if( currentSet[i] == otherSet[otherInex] )
{
matches++;
break;
}
}
if( matches == max )
{
addCurrent = false;
break;
}
}
if( addCurrent ) results.Add(currentSet);
}
}
This will take the set of sets, and iterate through each one. With each one, it will iterate through each set in the set again. As the nested iteration takes place, it will compare if the outer set is the same as the nested set (from the inner iteration) (if they are, no checking is done), it will also compare if the outer set has a total greater than the nested set (if the total is greater, then the outer set cannot be a proper subset), it will then compare if the outer set has a smaller amount of items than the nested set.
Once those checks are complete it begins with the first item of the outer set, and compares it with the first item of the nested set. If they are not equal, it will check the next item of the nested set. If they are equal, then it adds one to a counter, and will then compare the next item of the outer set with where it left off in the inner set.
If it reaches a point where the amount of matched comparisons equal the number of items in the outer set, then the outer set has been found to be a proper subset of the inner set. It is flagged to be excluded, and the comparisons are halted.