What is the right approach when using STL container for median calculation?

后端 未结 10 926
再見小時候
再見小時候 2020-12-02 10:17

Let\'s say I need to retrieve the median from a sequence of 1000000 random numeric values.

If using anything but std::list, I have no (

相关标签:
10条回答
  • 2020-12-02 10:57

    Any random-access container (like std::vector) can be sorted with the standard std::sort algorithm, available in the <algorithm> header.

    For finding the median, it would be quicker to use std::nth_element; this does enough of a sort to put one chosen element in the correct position, but doesn't completely sort the container. So you could find the median like this:

    int median(vector<int> &v)
    {
        size_t n = v.size() / 2;
        nth_element(v.begin(), v.begin()+n, v.end());
        return v[n];
    }
    
    0 讨论(0)
  • 2020-12-02 11:00

    There exists a linear-time selection algorithm. The below code only works when the container has a random-access iterator, but it can be modified to work without — you'll just have to be a bit more careful to avoid shortcuts like end - begin and iter + n.

    #include <algorithm>
    #include <cstdlib>
    #include <iostream>
    #include <sstream>
    #include <vector>
    
    template<class A, class C = std::less<typename A::value_type> >
    class LinearTimeSelect {
    public:
        LinearTimeSelect(const A &things) : things(things) {}
        typename A::value_type nth(int n) {
            return nth(n, things.begin(), things.end());
        }
    private:
        static typename A::value_type nth(int n,
                typename A::iterator begin, typename A::iterator end) {
            int size = end - begin;
            if (size <= 5) {
                std::sort(begin, end, C());
                return begin[n];
            }
            typename A::iterator walk(begin), skip(begin);
    #ifdef RANDOM // randomized algorithm, average linear-time
            typename A::value_type pivot = begin[std::rand() % size];
    #else // guaranteed linear-time, but usually slower in practice
            while (end - skip >= 5) {
                std::sort(skip, skip + 5);
                std::iter_swap(walk++, skip + 2);
                skip += 5;
            }
            while (skip != end) std::iter_swap(walk++, skip++);
            typename A::value_type pivot = nth((walk - begin) / 2, begin, walk);
    #endif
            for (walk = skip = begin, size = 0; skip != end; ++skip)
                if (C()(*skip, pivot)) std::iter_swap(walk++, skip), ++size;
            if (size <= n) return nth(n - size, walk, end);
            else return nth(n, begin, walk);
        }
        A things;
    };
    
    int main(int argc, char **argv) {
        std::vector<int> seq;
        {
            int i = 32;
            std::istringstream(argc > 1 ? argv[1] : "") >> i;
            while (i--) seq.push_back(i);
        }
        std::random_shuffle(seq.begin(), seq.end());
        std::cout << "unordered: ";
        for (std::vector<int>::iterator i = seq.begin(); i != seq.end(); ++i)
            std::cout << *i << " ";
        LinearTimeSelect<std::vector<int> > alg(seq);
        std::cout << std::endl << "linear-time medians: "
            << alg.nth((seq.size()-1) / 2) << ", " << alg.nth(seq.size() / 2);
        std::sort(seq.begin(), seq.end());
        std::cout << std::endl << "medians by sorting: "
            << seq[(seq.size()-1) / 2] << ", " << seq[seq.size() / 2] << std::endl;
        return 0;
    }
    
    0 讨论(0)
  • 2020-12-02 11:02

    The median is more complex than Mike Seymour's answer. The median differs depending on whether there are an even or an odd number of items in the sample. If there are an even number of items, the median is the average of the middle two items. This means that the median of a list of integers can be a fraction. Finally, the median of an empty list is undefined. Here is code that passes my basic test cases:

    ///Represents the exception for taking the median of an empty list
    class median_of_empty_list_exception:public std::exception{
      virtual const char* what() const throw() {
        return "Attempt to take the median of an empty list of numbers.  "
          "The median of an empty list is undefined.";
      }
    };
    
    ///Return the median of a sequence of numbers defined by the random
    ///access iterators begin and end.  The sequence must not be empty
    ///(median is undefined for an empty set).
    ///
    ///The numbers must be convertible to double.
    template<class RandAccessIter>
    double median(RandAccessIter begin, RandAccessIter end) 
      throw(median_of_empty_list_exception){
      if(begin == end){ throw median_of_empty_list_exception(); }
      std::size_t size = end - begin;
      std::size_t middleIdx = size/2;
      RandAccessIter target = begin + middleIdx;
      std::nth_element(begin, target, end);
    
      if(size % 2 != 0){ //Odd number of elements
        return *target;
      }else{            //Even number of elements
        double a = *target;
        RandAccessIter targetNeighbor= target-1;
        std::nth_element(begin, targetNeighbor, end);
        return (a+*targetNeighbor)/2.0;
      }
    }
    
    0 讨论(0)
  • 2020-12-02 11:03

    Here's a more complete version of Mike Seymour's answer:

    // Could use pass by copy to avoid changing vector
    double median(std::vector<int> &v)
    {
      size_t n = v.size() / 2;
      std::nth_element(v.begin(), v.begin()+n, v.end());
      int vn = v[n];
      if(v.size()%2 == 1)
      {
        return vn;
      }else
      {
        std::nth_element(v.begin(), v.begin()+n-1, v.end());
        return 0.5*(vn+v[n-1]);
      }
    }
    

    It handles odd- or even-length input.

    0 讨论(0)
  • 2020-12-02 11:04

    You can sort an std::vector using the library function std::sort.

    std::vector<int> vec;
    // ... fill vector with stuff
    std::sort(vec.begin(), vec.end());
    
    0 讨论(0)
  • 2020-12-02 11:07

    This algorithm handles both even and odd sized inputs efficiently using the STL nth_element (amortized O(N)) algorithm and the max_element algorithm (O(n)). Note that nth_element has another guaranteed side effect, namely that all of the elements before n are all guaranteed to be less than v[n], just not necessarily sorted.

    //post-condition: After returning, the elements in v may be reordered and the resulting order is implementation defined.
    double median(vector<double> &v)
    {
      if(v.empty()) {
        return 0.0;
      }
      auto n = v.size() / 2;
      nth_element(v.begin(), v.begin()+n, v.end());
      auto med = v[n];
      if(!(v.size() & 1)) { //If the set size is even
        auto max_it = max_element(v.begin(), v.begin()+n);
        med = (*max_it + med) / 2.0;
      }
      return med;    
    }
    
    0 讨论(0)
提交回复
热议问题