How to keep only duplicates efficiently?

后端 未结 10 749
闹比i
闹比i 2021-01-04 06:14

Given an STL vector, output only the duplicates in sorted order, e.g.,

INPUT : { 4, 4, 1, 2, 3, 2, 3 }
OUTPUT: { 2, 3, 4 }

The algorithm is

10条回答
  •  难免孤独
    2021-01-04 07:08

    What is meant by "as efficient as std::unique"? Efficient in terms of runtime, development time, memory usage, or what?

    As others pointed out, std::unique requires sorted input, which you haven't provided, so it's not a fair test to begin with.

    Personally I would just have a std::map do all of my work for me. It has a lot of properties we can use for maximal elegance/brevity. It keeps its elements sorted already, and operator[] will insert a zero value if the key doesn't already exist. By leveraging those properties, we can get this done in two or three lines of code, and still achieve reasonable runtime complexity.

    Basically, my algorithm is this: For each element in the vector, increment by one the map entry keyed by the value of that element. Afterwards, simply walk the map, outputting any key whose value is more than 1. Couldn't be simpler.

    #include 
    #include 
    #include 
    
    void
    output_sorted_duplicates(std::vector* v)
    {
       std::map m;  
    
       // count how many of each element there are, putting results into map
       // map keys are elements in the vector, 
       // map values are the frequency of that element
       for (std::vector::iterator vb = v->begin(); vb != v->end(); ++vb)
          ++m[*vb];
    
       // output keys whose values are 2 or more
       // the keys are already sorted by the map
       for (std::map::iterator mb = m.begin(); mb != m.end(); ++mb)
          if ( (*mb).second >= 2 ) 
             std::cout << (*mb).first << " "; 
       std::cout << std::endl;
    }
    
    int main(void) 
    { 
       int initializer[] = { 4, 4, 1, 2, 3, 2, 3 };
       std::vector data(&initializer[0], &initializer[0] + 7);
       output_sorted_duplicates(&data);
    }
    
    janks@phoenix:/tmp$ g++ test.cc && ./a.out
    2 3 4
    

    So, we visit each element in your vector once, and then each element in my map once, where the number of elements in my map is at worst no bigger than your vector. The drawbacks to my solution are a lot more storage space than the solutions that involve rearranging your vector in-place. The advantages, however, are clear. It's incredibly short and simple, it's obviously correct without the need for much testing or code review, and it has reasonable performance properties.

    Making my function a template, and making it operate on STL-style ranges instead of just vectors of ints, is left as an exercise.

提交回复
热议问题