Efficient algorithm for finding all maximal subsets

前端 未结 4 1454
面向向阳花
面向向阳花 2020-12-01 04:22

I have a collection of unique sets (represented as bit masks) and would like to eliminate all elements that are proper subsets of another element. For example:



        
4条回答
  •  天命终不由人
    2020-12-01 05:15

    Pre-process assumptions:

  • Input set is sorted by descending lengths
  • Each set is sorted ascending by value
  • There is access to a total and length for each set

    Approach #2 - Use a bucket approach

    Same assumptions. Can uniqueness be assumed? (i.e. there is not {1,4,6},{1,4,6}) Otherwise, you would need to check for distinct at some point, probably once the buckets are created.

    semi psuedo

    List Sets;//input
    List Output;
    List> Buckets;
    int length = Sets[0].length;//"by descending lengths"
    List Bucket = new List();//current bucket
    
    //Place each set with shared length in its own bucket
    for( Set set in Sets )
    {
     if( set.length == length )//current Bucket
     {
      Bucket.add(set);
     }else//new Bucket
     {
      length = set.length;
      Buckets.Add(Bucket);
      Bucket = new Bucket();
      Bucket.Add(set);
     }
    }
    Buckets.add(Bucket);
    
    
    
    //Based on the assumption of uniqueness, everything in the first bucket is
    //larger than every other set and since it is unique, they are not proper subsets
    Output.AddRange(Buckets[0]);
    
    //Iterate through the buckets
    for( int i = 1; i < Buckets.length; i++ )
    {
     List currentBucket = Buckets[i];
    
     //Iterate through the sets in the current bucket
     for( int a = 0; a < currentBucket.length; a++ )
     {
      Set currentSet = currentBucket[a];
      bool addSet = true;
      //Iterate through buckets with greater length
      for( int b = 0; b < i; b++ )
      {
       List testBucket = Buckets[b];
    
       //Iterate through the sets in testBucket
       for( int c = 0; c < testBucket.length; c++ )
       {
        Set testSet = testBucket[c];
        int testMatches = 0;
    
        //Iterate through the values in the current set
        for( int d = 0; d < currentSet.length; d++ )
        {
         int testIndex = 0;
    
         //Iterate through the values in the test set
         for( ; testIndex < testSet.length; testIndex++ )
         {
          if( currentSet[d] < testSet[testIndex] )
          {
           setClear = true;
           break;
          }
          if( currentSet[d] == testSet[testIndex] )
          {
           testMatches++;
           if( testMatches == currentSet.length )
           {
            addSet = false;
            setClear = true;
            break;
           }
          }
         }//testIndex
         if( setClear ) break;
        }//d
        if( !addSet ) break;
       }//c
       if( !addSet ) break;
      }//b
      if( addSet ) Output.Add( currentSet );
     }//a
    }//i
    

    Approach #1 (O( n(n+1)/2 )) ... not efficient enough

    semi psuedo

    //input Sets
    List results;
    for( int current = 0; current < Sets.length; current++ )
    {
     bool addCurrent = true;
     Set currentSet = Sets[current];
     for( int other = 0; other < current; other++)
     {
      Set otherSet = Sets[other];
      //is current a subset of other?
      if( currentSet.total > otherSet.total 
       || currentSet.length >= otherSet.length) continue;
      int max = currentSet.length;
      int matches = 0;
      int otherIndex = 0, len = otherSet.length;
      for( int i = 0; i < max; i++ )
      {
       for( ; otherIndex < len; otherIndex++ )
       {
         if( currentSet[i] == otherSet[otherInex] )
         {
          matches++;
          break;
         }
       }
       if( matches == max )
       {
        addCurrent = false;
        break;
       }
      }
      if( addCurrent ) results.Add(currentSet);
     }
    }
    

    This will take the set of sets, and iterate through each one. With each one, it will iterate through each set in the set again. As the nested iteration takes place, it will compare if the outer set is the same as the nested set (from the inner iteration) (if they are, no checking is done), it will also compare if the outer set has a total greater than the nested set (if the total is greater, then the outer set cannot be a proper subset), it will then compare if the outer set has a smaller amount of items than the nested set.

    Once those checks are complete it begins with the first item of the outer set, and compares it with the first item of the nested set. If they are not equal, it will check the next item of the nested set. If they are equal, then it adds one to a counter, and will then compare the next item of the outer set with where it left off in the inner set.

    If it reaches a point where the amount of matched comparisons equal the number of items in the outer set, then the outer set has been found to be a proper subset of the inner set. It is flagged to be excluded, and the comparisons are halted.

提交回复
热议问题