Is std::vector so much slower than plain arrays?

后端 未结 22 2639
南方客
南方客 2020-11-22 12:00

I\'ve always thought it\'s the general wisdom that std::vector is \"implemented as an array,\" blah blah blah. Today I went down and tested it, and it seems to

22条回答
  •  没有蜡笔的小新
    2020-11-22 12:31

    I did some extensive tests that I wanted to for a while now. Might as well share this.

    This is my dual boot machine i7-3770, 16GB Ram, x86_64, on Windows 8.1 and on Ubuntu 16.04. More information and conclusions, remarks below. Tested both MSVS 2017 and g++ (both on Windows and on Linux).

    Test Program

    #include 
    #include 
    //#include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    // Note: total size of array must not exceed 0x7fffffff B = 2,147,483,647B
    //  which means that largest int array size is 536,870,911
    // Also image size cannot be larger than 80,000,000B
    constexpr int long g_size = 100000;
    int g_A[g_size];
    
    
    int main()
    {
        std::locale loc("");
        std::cout.imbue(loc);
        constexpr int long size = 100000;  // largest array stack size
    
        // stack allocated c array
        std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
        int A[size];
        for (int i = 0; i < size; i++)
            A[i] = i;
    
        auto duration = std::chrono::duration_cast(std::chrono::steady_clock::now() - start).count();
        std::cout << "c-style stack array duration=" << duration / 1000.0 << "ms\n";
        std::cout << "c-style stack array size=" << sizeof(A) << "B\n\n";
    
        // global stack c array
        start = std::chrono::steady_clock::now();
        for (int i = 0; i < g_size; i++)
            g_A[i] = i;
    
        duration = std::chrono::duration_cast(std::chrono::steady_clock::now() - start).count();
        std::cout << "global c-style stack array duration=" << duration / 1000.0 << "ms\n";
        std::cout << "global c-style stack array size=" << sizeof(g_A) << "B\n\n";
    
        // raw c array heap array
        start = std::chrono::steady_clock::now();
        int* AA = new int[size];    // bad_alloc() if it goes higher than 1,000,000,000
        for (int i = 0; i < size; i++)
            AA[i] = i;
    
        duration = std::chrono::duration_cast(std::chrono::steady_clock::now() - start).count();
        std::cout << "c-style heap array duration=" << duration / 1000.0 << "ms\n";
        std::cout << "c-style heap array size=" << sizeof(AA) << "B\n\n";
        delete[] AA;
    
        // std::array<>
        start = std::chrono::steady_clock::now();
        std::array AAA;
        for (int i = 0; i < size; i++)
            AAA[i] = i;
        //std::sort(AAA.begin(), AAA.end());
    
        duration = std::chrono::duration_cast(std::chrono::steady_clock::now() - start).count();
        std::cout << "std::array duration=" << duration / 1000.0 << "ms\n";
        std::cout << "std::array size=" << sizeof(AAA) << "B\n\n";
    
        // std::vector<>
        start = std::chrono::steady_clock::now();
        std::vector v;
        for (int i = 0; i < size; i++)
            v.push_back(i);
        //std::sort(v.begin(), v.end());
    
        duration = std::chrono::duration_cast(std::chrono::steady_clock::now() - start).count();
        std::cout << "std::vector duration=" << duration / 1000.0 << "ms\n";
        std::cout << "std::vector size=" << v.size() * sizeof(v.back()) << "B\n\n";
    
        // std::deque<>
        start = std::chrono::steady_clock::now();
        std::deque dq;
        for (int i = 0; i < size; i++)
            dq.push_back(i);
        //std::sort(dq.begin(), dq.end());
    
        duration = std::chrono::duration_cast(std::chrono::steady_clock::now() - start).count();
        std::cout << "std::deque duration=" << duration / 1000.0 << "ms\n";
        std::cout << "std::deque size=" << dq.size() * sizeof(dq.back()) << "B\n\n";
    
        // std::queue<>
        start = std::chrono::steady_clock::now();
        std::queue q;
        for (int i = 0; i < size; i++)
            q.push(i);
    
        duration = std::chrono::duration_cast(std::chrono::steady_clock::now() - start).count();
        std::cout << "std::queue duration=" << duration / 1000.0 << "ms\n";
        std::cout << "std::queue size=" << q.size() * sizeof(q.front()) << "B\n\n";
    }
    

    Results

    //////////////////////////////////////////////////////////////////////////////////////////
    // with MSVS 2017:
    // >> cl /std:c++14 /Wall -O2 array_bench.cpp
    //
    // c-style stack array duration=0.15ms
    // c-style stack array size=400,000B
    //
    // global c-style stack array duration=0.130ms
    // global c-style stack array size=400,000B
    //
    // c-style heap array duration=0.90ms
    // c-style heap array size=4B
    //
    // std::array duration=0.20ms
    // std::array size=400,000B
    //
    // std::vector duration=0.544ms
    // std::vector size=400,000B
    //
    // std::deque duration=1.375ms
    // std::deque size=400,000B
    //
    // std::queue duration=1.491ms
    // std::queue size=400,000B
    //
    //////////////////////////////////////////////////////////////////////////////////////////
    //
    // with g++ version:
    //      - (tdm64-1) 5.1.0 on Windows
    //      - (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609 on Ubuntu 16.04
    // >> g++ -std=c++14 -Wall -march=native -O2 array_bench.cpp -o array_bench
    //
    // c-style stack array duration=0ms
    // c-style stack array size=400,000B
    //
    // global c-style stack array duration=0.124ms
    // global c-style stack array size=400,000B
    //
    // c-style heap array duration=0.648ms
    // c-style heap array size=8B
    //
    // std::array duration=1ms
    // std::array size=400,000B
    //
    // std::vector duration=0.402ms
    // std::vector size=400,000B
    //
    // std::deque duration=0.234ms
    // std::deque size=400,000B
    //
    // std::queue duration=0.304ms
    // std::queue size=400,000
    //
    //////////////////////////////////////////////////////////////////////////////////////////
    

    Notes

    • Assembled by an average of 10 runs.
    • I initially performed tests with std::sort() too (you can see it commented out) but removed them later because there were no significant relative differences.

    My Conclusions and Remarks

    • notice how global c-style array takes almost as much time as the heap c-style array
    • Out of all tests I noticed a remarkable stability in std::array's time variations between consecutive runs, while others especially std:: data structs varied wildly in comparison
    • O3 optimization didn't show any noteworthy time differences
    • Removing optimization on Windows cl (no -O2) and on g++ (Win/Linux no -O2, no -march=native) increases times SIGNIFICANTLY. Particularly for std::data structs. Overall higher times on MSVS than g++, but std::array and c-style arrays faster on Windows without optimization
    • g++ produces faster code than microsoft's compiler (apparently it runs faster even on Windows).

    Verdict

    Of course this is code for an optimized build. And since the question was about std::vector then yes it is !much! slower than plain arrays (optimized/unoptimized). But when you're doing a benchmark, you naturally want to produce optimized code.

    The star of the show for me though has been std::array.

提交回复
热议问题