I am trying to create a huge boolean matrix which is randomly filled with True and False with a given probability p. At f
Another possibility could be to generate it in a batch (i.e. compute many sub-arrays and stack them together at the very end). But, consider not to update one array (mask) in a for loop as OP is doing. This would force the whole array to load in main memory during every indexing update.
Instead for example: to get 30000x30000, have 9000 100x100 separate arrays, update each of this 100x100 array accordingly in a for loop and finally stack these 9000 arrays together in a giant array. This would definitely need not more than 4GB of RAM and would be very fast as well.
Minimal Example:
In [9]: a
Out[9]:
array([[0, 1],
[2, 3]])
In [10]: np.hstack([np.vstack([a]*5)]*5)
Out[10]:
array([[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3, 2, 3, 2, 3]])
In [11]: np.hstack([np.vstack([a]*5)]*5).shape
Out[11]: (10, 10)