I have a very large 2D array which looks something like this:
a=
[[a1, b1, c1],
[a2, b2, c2],
...,
[an, bn, cn]]
Using numpy, is there a
An alternative way of doing it is by using the choice method of the Generator class, https://github.com/numpy/numpy/issues/10835
import numpy as np
# generate the random array
A = np.random.randint(5, size=(10,3))
# use the choice method of the Generator class
rng = np.random.default_rng()
A_sampled = rng.choice(A, 2)
leading to a sampled data,
array([[1, 3, 2],
[1, 2, 1]])
The running time is also profiled compared as follows,
%timeit rng.choice(A, 2)
15.1 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.random.permutation(A)[:2]
4.22 µs ± 83.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit A[np.random.randint(A.shape[0], size=2), :]
10.6 µs ± 418 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
But when the array goes big, A = np.random.randint(10, size=(1000,300)). working on the index is the best way.
%timeit A[np.random.randint(A.shape[0], size=50), :]
17.6 µs ± 657 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit rng.choice(A, 50)
22.3 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.random.permutation(A)[:50]
143 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
So the permutation method seems to be the most efficient one when your array is small while working on the index is the optimal solution when your array goes big.