Efficient algorithm for finding all maximal subsets

前端未结

关注

 4  1454

面向向阳花 2020-12-01 04:22

I have a collection of unique sets (represented as bit masks) and would like to eliminate all elements that are proper subsets of another element. For example:

4条回答

天命终不由人 (楼主)

2020-12-01 05:15

Pre-process assumptions:
Input set is sorted by descending lengths
Each set is sorted ascending by value

There is access to a total and length for each set

Approach #2 - Use a bucket approach

Same assumptions. Can uniqueness be assumed? (i.e. there is not {1,4,6},{1,4,6}) Otherwise, you would need to check for distinct at some point, probably once the buckets are created.

semi psuedo

List Sets;//input
List Output;
List> Buckets;
int length = Sets[0].length;//"by descending lengths"
List Bucket = new List();//current bucket

//Place each set with shared length in its own bucket
for( Set set in Sets )
{
 if( set.length == length )//current Bucket
 {
  Bucket.add(set);
 }else//new Bucket
 {
  length = set.length;
  Buckets.Add(Bucket);
  Bucket = new Bucket();
  Bucket.Add(set);
 }
}
Buckets.add(Bucket);



//Based on the assumption of uniqueness, everything in the first bucket is
//larger than every other set and since it is unique, they are not proper subsets
Output.AddRange(Buckets[0]);

//Iterate through the buckets
for( int i = 1; i < Buckets.length; i++ )
{
 List currentBucket = Buckets[i];

 //Iterate through the sets in the current bucket
 for( int a = 0; a < currentBucket.length; a++ )
 {
  Set currentSet = currentBucket[a];
  bool addSet = true;
  //Iterate through buckets with greater length
  for( int b = 0; b < i; b++ )
  {
   List testBucket = Buckets[b];

   //Iterate through the sets in testBucket
   for( int c = 0; c < testBucket.length; c++ )
   {
    Set testSet = testBucket[c];
    int testMatches = 0;

    //Iterate through the values in the current set
    for( int d = 0; d < currentSet.length; d++ )
    {
     int testIndex = 0;

     //Iterate through the values in the test set
     for( ; testIndex < testSet.length; testIndex++ )
     {
      if( currentSet[d] < testSet[testIndex] )
      {
       setClear = true;
       break;
      }
      if( currentSet[d] == testSet[testIndex] )
      {
       testMatches++;
       if( testMatches == currentSet.length )
       {
        addSet = false;
        setClear = true;
        break;
       }
      }
     }//testIndex
     if( setClear ) break;
    }//d
    if( !addSet ) break;
   }//c
   if( !addSet ) break;
  }//b
  if( addSet ) Output.Add( currentSet );
 }//a
}//i

Approach #1 (`O( n(n+1)/2 )`) ... not efficient enough

semi psuedo

//input Sets
List results;
for( int current = 0; current < Sets.length; current++ )
{
 bool addCurrent = true;
 Set currentSet = Sets[current];
 for( int other = 0; other < current; other++)
 {
  Set otherSet = Sets[other];
  //is current a subset of other?
  if( currentSet.total > otherSet.total 
   || currentSet.length >= otherSet.length) continue;
  int max = currentSet.length;
  int matches = 0;
  int otherIndex = 0, len = otherSet.length;
  for( int i = 0; i < max; i++ )
  {
   for( ; otherIndex < len; otherIndex++ )
   {
     if( currentSet[i] == otherSet[otherInex] )
     {
      matches++;
      break;
     }
   }
   if( matches == max )
   {
    addCurrent = false;
    break;
   }
  }
  if( addCurrent ) results.Add(currentSet);
 }
}

This will take the set of sets, and iterate through each one. With each one, it will iterate through each set in the set again. As the nested iteration takes place, it will compare if the outer set is the same as the nested set (from the inner iteration) (if they are, no checking is done), it will also compare if the outer set has a total greater than the nested set (if the total is greater, then the outer set cannot be a proper subset), it will then compare if the outer set has a smaller amount of items than the nested set.

Once those checks are complete it begins with the first item of the outer set, and compares it with the first item of the nested set. If they are not equal, it will check the next item of the nested set. If they are equal, then it adds one to a counter, and will then compare the next item of the outer set with where it left off in the inner set.

If it reaches a point where the amount of matched comparisons equal the number of items in the outer set, then the outer set has been found to be a proper subset of the inner set. It is flagged to be excluded, and the comparisons are halted.

0 讨论(0)

查看其它4个回答

Efficient algorithm for finding all maximal subsets

Approach #2 - Use a bucket approach

Approach #1 (O( n(n+1)/2 )) ... not efficient enough

Approach #1 (`O( n(n+1)/2 )`) ... not efficient enough