Numpy shuffle multidimensional array by row only, keep column order unchanged

后端未结

关注

 6  620

How can I shuffle a multidimensional array by row only in Python (so do not shuffle the columns).

I am looking for the most efficient solution, because my ma

相关标签:

6条回答

无人共我

2020-12-05 13:04

You can also use np.random.permutation to generate random permutation of row indices and then index into the rows of X using np.take with axis=0. Also, np.take facilitates overwriting to the input array X itself with out= option, which would save us memory. Thus, the implementation would look like this -

np.take(X,np.random.permutation(X.shape[0]),axis=0,out=X)

Sample run -

In [23]: X Out[23]: array([[ 0.60511059, 0.75001599], [ 0.30968339, 0.09162172], [ 0.14673218, 0.09089028], [ 0.31663128, 0.10000309], [ 0.0957233 , 0.96210485], [ 0.56843186, 0.36654023]]) In [24]: np.take(X,np.random.permutation(X.shape[0]),axis=0,out=X); In [25]: X Out[25]: array([[ 0.14673218, 0.09089028], [ 0.31663128, 0.10000309], [ 0.30968339, 0.09162172], [ 0.56843186, 0.36654023], [ 0.0957233 , 0.96210485], [ 0.60511059, 0.75001599]])

Additional performance boost

Here's a trick to speed up np.random.permutation(X.shape[0]) with np.argsort() -

np.random.rand(X.shape[0]).argsort()

Speedup results -

In [32]: X = np.random.random((6000, 2000)) In [33]: %timeit np.random.permutation(X.shape[0]) 1000 loops, best of 3: 510 µs per loop In [34]: %timeit np.random.rand(X.shape[0]).argsort() 1000 loops, best of 3: 297 µs per loop

Thus, the shuffling solution could be modified to -

np.take(X,np.random.rand(X.shape[0]).argsort(),axis=0,out=X)

Runtime tests -

These tests include the two approaches listed in this post and np.shuffle based one in @Kasramvd's solution.

In [40]: X = np.random.random((6000, 2000)) In [41]: %timeit np.random.shuffle(X) 10 loops, best of 3: 25.2 ms per loop In [42]: %timeit np.take(X,np.random.permutation(X.shape[0]),axis=0,out=X) 10 loops, best of 3: 53.3 ms per loop In [43]: %timeit np.take(X,np.random.rand(X.shape[0]).argsort(),axis=0,out=X) 10 loops, best of 3: 53.2 ms per loop

So, it seems using these np.take based could be used only if memory is a concern or else np.random.shuffle based solution looks like the way to go.

0 讨论(0)

发布评论:

提交评论

加载中...

终归单人心

2020-12-05 13:07

That's what numpy.random.shuffle() is for :

>>> X = np.random.random((6, 2)) >>> X array([[ 0.9818058 , 0.67513579], [ 0.82312674, 0.82768118], [ 0.29468324, 0.59305925], [ 0.25731731, 0.16676408], [ 0.27402974, 0.55215778], [ 0.44323485, 0.78779887]]) >>> np.random.shuffle(X) >>> X array([[ 0.9818058 , 0.67513579], [ 0.44323485, 0.78779887], [ 0.82312674, 0.82768118], [ 0.29468324, 0.59305925], [ 0.25731731, 0.16676408], [ 0.27402974, 0.55215778]])

0 讨论(0)

发布评论:

提交评论

加载中...

谎友^

2020-12-05 13:08

After a bit experiment i found most memory and time efficient way to shuffle data(row wise) of nd-array is, shuffle the index and get the data from shuffled index

rand_num2 = np.random.randint(5, size=(6000, 2000)) perm = np.arange(rand_num2.shape[0]) np.random.shuffle(perm) rand_num2 = rand_num2[perm]

in more details
Here, I am using memory_profiler to find memory usage and python's builtin "time" module to record time and comparing all previous answers

def main(): # shuffle data itself rand_num = np.random.randint(5, size=(6000, 2000)) start = time.time() np.random.shuffle(rand_num) print('Time for direct shuffle: {0}'.format((time.time() - start))) # Shuffle index and get data from shuffled index rand_num2 = np.random.randint(5, size=(6000, 2000)) start = time.time() perm = np.arange(rand_num2.shape[0]) np.random.shuffle(perm) rand_num2 = rand_num2[perm] print('Time for shuffling index: {0}'.format((time.time() - start))) # using np.take() rand_num3 = np.random.randint(5, size=(6000, 2000)) start = time.time() np.take(rand_num3, np.random.rand(rand_num3.shape[0]).argsort(), axis=0, out=rand_num3) print("Time taken by np.take, {0}".format((time.time() - start)))

Result for Time

Time for direct shuffle: 0.03345608711242676 # 33.4msec Time for shuffling index: 0.019818782806396484 # 19.8msec Time taken by np.take, 0.06726956367492676 # 67.2msec

Memory profiler Result

Line # Mem usage Increment Line Contents ================================================ 39 117.422 MiB 0.000 MiB @profile 40 def main(): 41 # shuffle data itself 42 208.977 MiB 91.555 MiB rand_num = np.random.randint(5, size=(6000, 2000)) 43 208.977 MiB 0.000 MiB start = time.time() 44 208.977 MiB 0.000 MiB np.random.shuffle(rand_num) 45 208.977 MiB 0.000 MiB print('Time for direct shuffle: {0}'.format((time.time() - start))) 46 47 # Shuffle index and get data from shuffled index 48 300.531 MiB 91.555 MiB rand_num2 = np.random.randint(5, size=(6000, 2000)) 49 300.531 MiB 0.000 MiB start = time.time() 50 300.535 MiB 0.004 MiB perm = np.arange(rand_num2.shape[0]) 51 300.539 MiB 0.004 MiB np.random.shuffle(perm) 52 300.539 MiB 0.000 MiB rand_num2 = rand_num2[perm] 53 300.539 MiB 0.000 MiB print('Time for shuffling index: {0}'.format((time.time() - start))) 54 55 # using np.take() 56 392.094 MiB 91.555 MiB rand_num3 = np.random.randint(5, size=(6000, 2000)) 57 392.094 MiB 0.000 MiB start = time.time() 58 392.242 MiB 0.148 MiB np.take(rand_num3, np.random.rand(rand_num3.shape[0]).argsort(), axis=0, out=rand_num3) 59 392.242 MiB 0.000 MiB print("Time taken by np.take, {0}".format((time.time() - start)))

0 讨论(0)

发布评论:

提交评论

加载中...

灰色年华

2020-12-05 13:18

I tried many solutions, and at the end I used this simple one:

from sklearn.utils import shuffle x = np.array([[1, 2], [3, 4], [5, 6]]) print(shuffle(x, random_state=0))

output:

[ [5 6] [3 4] [1 2] ]

if you have 3d array, loop through the 1st axis (axis=0) and apply this function, like:

np.array([shuffle(item) for item in 3D_numpy_array])

0 讨论(0)

发布评论:

提交评论

加载中...

执念已碎

2020-12-05 13:24

I have a question on this (or maybe it is the answer) Lets say we have a numpy array X with shape=(1000,60,11,1) Also suppose that X is an array of images with size 60x11 and channel number =1 (60x11x1).

What if I want to shuffle the order of all these images, and to do that I'll use shuffling on the indexes of X.

def shuffling( X): indx=np.arange(len(X)) # create a array with indexes for X data np.random.shuffle(indx) X=X[indx] return X

Will that work? From my knowledge len(X) will return the biggest dimension size.

0 讨论(0)

发布评论:

提交评论

加载中...

挽巷

2020-12-05 13:26

You can shuffle a two dimensional array A by row using the np.vectorize() function:

shuffle = np.vectorize(np.random.permutation, signature='(n)->(n)') A_shuffled = shuffle(A)

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复