Is std::vector so much slower than plain arrays?

后端 未结 22 2756
南方客
南方客 2020-11-22 12:00

I\'ve always thought it\'s the general wisdom that std::vector is \"implemented as an array,\" blah blah blah. Today I went down and tested it, and it seems to

22条回答
  •  星月不相逢
    2020-11-22 12:27

    It was hardly a fair comparison when I first looked at your code; I definitely thought you weren't comparing apples with apples. So I thought, let's get constructors and destructors being called on all tests; and then compare.

    const size_t dimension = 1000;
    
    void UseArray() {
        TestTimer t("UseArray");
        for(size_t j = 0; j < dimension; ++j) {
            Pixel* pixels = new Pixel[dimension * dimension];
            for(size_t i = 0 ; i < dimension * dimension; ++i) {
                pixels[i].r = 255;
                pixels[i].g = 0;
                pixels[i].b = (unsigned char) (i % 255);
            }
            delete[] pixels;
        }
    }
    
    void UseVector() {
        TestTimer t("UseVector");
        for(size_t j = 0; j < dimension; ++j) {
            std::vector pixels(dimension * dimension);
            for(size_t i = 0; i < dimension * dimension; ++i) {
                pixels[i].r = 255;
                pixels[i].g = 0;
                pixels[i].b = (unsigned char) (i % 255);
            }
        }
    }
    
    int main() {
        TestTimer t1("The whole thing");
    
        UseArray();
        UseVector();
    
        return 0;
    }
    

    My thoughts were, that with this setup, they should be exactly the same. It turns out, I was wrong.

    UseArray completed in 3.06 seconds
    UseVector completed in 4.087 seconds
    The whole thing completed in 10.14 seconds
    

    So why did this 30% performance loss even occur? The STL has everything in headers, so it should have been possible for the compiler to understand everything that was required.

    My thoughts were that it is in how the loop initialises all values to the default constructor. So I performed a test:

    class Tester {
    public:
        static int count;
        static int count2;
        Tester() { count++; }
        Tester(const Tester&) { count2++; }
    };
    int Tester::count = 0;
    int Tester::count2 = 0;
    
    int main() {
        std::vector myvec(300);
        printf("Default Constructed: %i\nCopy Constructed: %i\n", Tester::count, Tester::count2);
    
        return 0;
    }
    

    The results were as I suspected:

    Default Constructed: 1
    Copy Constructed: 300
    

    This is clearly the source of the slowdown, the fact that the vector uses the copy constructor to initialise the elements from a default constructed object.

    This means, that the following pseudo-operation order is happening during construction of the vector:

    Pixel pixel;
    for (auto i = 0; i < N; ++i) vector[i] = pixel;
    

    Which, due to the implicit copy constructor made by the compiler, is expanded to the following:

    Pixel pixel;
    for (auto i = 0; i < N; ++i) {
        vector[i].r = pixel.r;
        vector[i].g = pixel.g;
        vector[i].b = pixel.b;
    }
    

    So the default Pixel remains un-initialised, while the rest are initialised with the default Pixel's un-initialised values.

    Compared to the alternative situation with New[]/Delete[]:

    int main() {
        Tester* myvec = new Tester[300];
    
        printf("Default Constructed: %i\nCopy Constructed:%i\n", Tester::count, Tester::count2);
    
        delete[] myvec;
    
        return 0;
    }
    
    Default Constructed: 300
    Copy Constructed: 0
    

    They are all left to their un-initialised values, and without the double iteration over the sequence.

    Armed with this information, how can we test it? Let's try over-writing the implicit copy constructor.

    Pixel(const Pixel&) {}
    

    And the results?

    UseArray completed in 2.617 seconds
    UseVector completed in 2.682 seconds
    The whole thing completed in 5.301 seconds
    

    So in summary, if you're making hundreds of vectors very often: re-think your algorithm.

    In any case, the STL implementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.

提交回复
热议问题