I\'m designing a bloom filter and I\'m wondering what the most performant bit array implementation is in Python.
The nice thing about Python is that it can handle ar
Disclaimer: I am the main developer of intbitset :-) which was mentioned above in one of the comments. This is just to let you know that since some weeks intbitset is now compatible with Python 3.3 and 3.4. Additionally it looks like it goes almost twice as fast WRT the native int functionality:
import random
from intbitset import intbitset
x = random.sample(range(1000000), 10000)
y = random.sample(range(1000000), 10000)
m = 0
for i in x:
m += 1 << i
n = 0
for i in x:
n += 1 << i
mi = intbitset(x)
ni = intbitset(y)
%timeit m & n ## native int
10000 loops, best of 3: 27.3 µs per loop
%timeit mi & ni ## intbitset
100000 loops, best of 3: 13.9 µs per loop
%timeit m | n ## native int
10000 loops, best of 3: 26.8 µs per loop
%timeit mi | ni ## intbitset
100000 loops, best of 3: 15.8 µs per loop
## note the above were just tested on Python 2.7, Ubuntu 14.04.
Additionally intbitset supports some unique features such as infinite sets, which are useful e.g. to build search engine where you have the concept of universe (e.g. taking the union of an infinite set with a regular set will return an infinite set, etc.)
For more information about intbitset performance WRT Python sets see instead: http://intbitset.readthedocs.org/en/latest/#performance