Is using a vector of boolean values slower than a dynamic bitset?

后端 未结 4 1193
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-09 03:34

Is using a vector of boolean values slower than a dynamic bitset?

I just heard about boost\'s dynamic bitset, and I was wondering is it worth the trouble. Can I just

4条回答
  •  余生分开走
    2020-12-09 04:06

    A great deal here depends on how many Boolean values you're working with.

    Both bitset and vector normally use a packed representation where a Boolean is stored as only a single bit.

    On one hand, that imposes some overhead in the form of bit manipulation to access a single value.

    On the other hand, that also means many more of your Booleans will fit in your cache.

    If you're using a lot of Booleans (e.g., implementing a sieve of Eratosthenes) fitting more of them in the cache will almost always end up a net gain. The reduction in memory use will gain you a lot more than the bit manipulation loses.

    Most of the arguments against std::vector come back to the fact that it is not a standard container (i.e., it does not meet the requirements for a container). IMO, this is mostly a question of expectations -- since it says vector, many people expect it to be a container (other types of vectors are), and they often react negatively to the fact that vector isn't a container.

    If you're using the vector in a way that really requires it to be a container, then you probably want to use some other combination -- either deque or vector can work fine. Think before you do that though -- there's a lot of (lousy, IMO) advice that vector should be avoided in general, with little or no explanation of why it should be avoided at all, or under what circumstances it makes a real difference to you.

    Yes, there are situations where something else will work better. If you're in one of those situations, using something else is clearly a good idea. But, be sure you're really in one of those situations first. Anybody who tells you (for example) that "Herb says you should use vector" without a lot of explanation about the tradeoffs involved should not be trusted.

    Let's give a real example. Since it was mentioned in the comments, let's consider the Sieve of Eratosthenes:

    #include 
    #include 
    #include 
    #include 
    
    unsigned long primes = 0;
    
    template 
    unsigned long sieve(unsigned max) {
        std::vector sieve(max, false);
        sieve[0] = sieve[1] = true;
    
        for (int i = 2; i < max; i++) {
            if (!sieve[i]) {
                ++primes;
                for (int temp = 2 * i; temp < max; temp += i)
                    sieve[temp] = true;
            }
        }
        return primes;
    }
    
    // Warning: auto return type will fail with older compilers
    // Fine with g++ 5.1 and VC++ 2015 though.
    //
    template 
    auto timer(F f, int max) {
        auto start = std::chrono::high_resolution_clock::now();
        primes += f(max);
        auto stop = std::chrono::high_resolution_clock::now();
    
        return stop - start;
    }
    
    int main() {
        using namespace std::chrono;
    
        unsigned number = 100000000;
    
        auto using_bool = timer(sieve, number);
        auto using_char = timer(sieve, number);
    
        std::cout << "ignore: " << primes << "\n";
        std::cout << "Time using bool: " << duration_cast(using_bool).count() << "\n";
        std::cout << "Time using char: " << duration_cast(using_char).count() << "\n";
    }
    

    We've used a large enough array that we can expect a large portion of it to occupy main memory. I've also gone to a little pain to ensure that the only thing that changes between one invocation and the other is the use of a vector vs. vector. Here are some results. First with VC++ 2015:

    ignore: 34568730
    Time using bool: 2623
    Time using char: 3108
    

    ...then the time using g++ 5.1:

    ignore: 34568730
    Time using bool: 2359
    Time using char: 3116
    

    Obviously, the vector wins in both cases--by around 15% with VC++, and over 30% with gcc. Also note that in this case, I've chosen the size to show vector in quite favorable light. If, for example, I reduce number from 100000000 to 10000000, the time differential becomes much larger:

    ignore: 3987474
    Time using bool: 72
    Time using char: 249
    

    Although I haven't done a lot of work to confirm, I'd guess that in this case, the version using vector is saving enough space that the array fits entirely in the cache, while the vector is large enough to overflow the cache, and involve a great deal of main memory access.

提交回复
热议问题