Numpy: Get random set of rows from 2D array

后端未结

关注

 8  2096

挽巷 2020-11-28 01:49

I have a very large 2D array which looks something like this:

a=
[[a1, b1, c1],
 [a2, b2, c2],
 ...,
 [an, bn, cn]]

Using numpy, is there a

8条回答

隐瞒了意图╮ (楼主)

2020-11-28 02:46

An alternative way of doing it is by using the choice method of the Generator class, https://github.com/numpy/numpy/issues/10835

import numpy as np

# generate the random array
A = np.random.randint(5, size=(10,3))

# use the choice method of the Generator class
rng = np.random.default_rng()
A_sampled = rng.choice(A, 2)

leading to a sampled data,

array([[1, 3, 2],
       [1, 2, 1]])

The running time is also profiled compared as follows,

%timeit rng.choice(A, 2)
15.1 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.random.permutation(A)[:2]
4.22 µs ± 83.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit A[np.random.randint(A.shape[0], size=2), :]
10.6 µs ± 418 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

But when the array goes big, A = np.random.randint(10, size=(1000,300)). working on the index is the best way.

%timeit A[np.random.randint(A.shape[0], size=50), :]
17.6 µs ± 657 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit rng.choice(A, 50)
22.3 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit np.random.permutation(A)[:50]
143 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So the permutation method seems to be the most efficient one when your array is small while working on the index is the optimal solution when your array goes big.

0 讨论(0)

查看其它8个回答