I am trying to create a huge boolean matrix which is randomly filled with True and False with a given probability p. At f
So I tried to split it up into the generation of the single rows by doing this:
The way that np.random.choice works is by first generating a float64 in [0, 1) for every cell of your data, and then converting that into an index in your array using np.search_sorted. This intermediate representation is 8 times larger than the boolean array!
Since your data is boolean, you can get a factor of two speedup with
np.random.rand(N, N) > p
Which naturally, you could use inside your looping solution
It seems like np.random.choice could do with some buffering here - you might want to file an issue against numpy.
Another option would be to try and generate float32s instead of float64s. I'm not sure if numpy can do that right now, but you could request the feature.