I am trying to create a huge boolean
matrix which is randomly filled with True
and False
with a given probability p
. At f
So I tried to split it up into the generation of the single rows by doing this:
The way that np.random.choice
works is by first generating a float64
in [0, 1)
for every cell of your data, and then converting that into an index in your array using np.search_sorted
. This intermediate representation is 8 times larger than the boolean array!
Since your data is boolean, you can get a factor of two speedup with
np.random.rand(N, N) > p
Which naturally, you could use inside your looping solution
It seems like np.random.choice
could do with some buffering here - you might want to file an issue against numpy.
Another option would be to try and generate float32
s instead of float64
s. I'm not sure if numpy can do that right now, but you could request the feature.