问题
I have a list of integers (di), and another list (rang_indx) made up of numpy sub-arrays of integers (code below). For each of these sub-arrays, I need to store in a separate list (indx) a number of random elements, given by the di list.
For what I can see np.random.shuffle() will not shuffle the elements within the sub-arrays but the sub-arrays themselves within rang_indx, which is not what I need. Hence, I need to use a for loop to first shuffle the sub-arrays (in place), and then another one (combined with a zip()) to generate the indx list.
This function is called millions of times as part of a larger code. Is there a way I can speed up the process?
import numpy as np
def func(di, rang_indx):
# Shuffle each sub-array in place.
for _ in rang_indx:
np.random.shuffle(_)
# For each shuffled sub-array, only keep as many elements as those
# indicated by the 'di' array.
indx = [_[:i] for (_, i) in zip(*[rang_indx, di.astype(int)])]
return indx
# This data is not fixed, and will change with each call to func()
di = np.array([ 4., 2., 0., 600., 12., 22., 13., 21., 25., 25., 12., 11.,
7., 12., 10., 13., 5., 10.])
rang_indx = [np.array([]), np.array([189, 195, 209, 214, 236, 237, 255, 286, 290, 296, 301, 304, 321,
323, 327, 329]), np.array([164, 171, 207, 217, 225, 240, 250, 263, 272, 279, 284, 285, 289]), np.array([101, 162, 168, 177, 179, 185, 258, 261, 264, 269, 270, 278, 281,
287, 293, 298]), np.array([111, 127, 143, 156, 159, 161, 181, 182, 183, 194, 196, 198, 204,
205, 210, 212, 235, 239, 267, 268, 297]), np.array([107, 116, 120, 128, 130, 136, 137, 144, 152, 155, 157, 166, 169,
170, 184, 186, 192, 218, 220, 226, 228, 241, 245, 246, 247, 251,
252, 253]), np.array([ 99, 114, 118, 121, 131, 134, 158, 216, 219, 221, 224, 231, 233,
234, 243, 244]), np.array([ 34, 37, 38, 48, 56, 78, 84, 100, 108, 117, 122, 123, 132,
149, 151, 153, 163, 178, 180, 191, 199, 202, 208, 211]), np.array([ 31, 40, 41, 45, 51, 53, 57, 60, 61, 66, 67, 69, 71,
75, 85, 90, 95, 96, 167, 173, 174, 176, 188, 190, 197, 206]), np.array([ 0, 1, 2, 3, 6, 11, 12, 13, 17, 25, 33, 36, 47,
58, 64, 76, 87, 94, 160, 165, 172, 175, 187, 193, 201, 203]), np.array([ 4, 16, 18, 19, 109, 113, 115, 124, 138, 142, 145, 150]), np.array([103, 105, 106, 112, 125, 135, 139, 140, 141, 146, 147, 154]), np.array([102, 104, 110, 119, 126, 129, 133, 148]), np.array([29, 32, 42, 43, 55, 63, 72, 77, 79, 83, 91, 92]), np.array([35, 49, 59, 73, 74, 81, 86, 88, 89, 97, 98]), np.array([30, 39, 44, 46, 50, 52, 54, 62, 65, 68, 80, 82, 93]), np.array([ 8, 10, 15, 27, 70]), np.array([ 5, 7, 9, 14, 20, 21, 22, 23, 24, 26, 28])]
func(di, rang_indx)
回答1:
Approach #1 : Here's one idea with the intention to keep minimal work when we loop and use one loop only -
- Create a
2Drandom array in interval[0,1)to cover the max. length of subarrays. - For each subarray, set the invalid places to
1.0. Get argsort for each row. Those 1s corresponding to the invalid places would stay at the back because there were no 1s in the original random array. Thus, we have the indices array. - Slice each row of those indices array to the extent of the lengths listed in
di. - Start a loop and slice each subarray from
rang_indxusing those sliced indices.
Hence, the implementation -
lens = np.array([len(i) for i in rang_indx])
di0 = np.minimum(lens, di.astype(int))
invalid_mask = lens[:,None] <= np.arange(lens.max())
rand_nums = np.random.rand(len(lens), lens.max())
rand_nums[invalid_mask] = 1
shuffled_indx = np.argpartition(rand_nums, lens-1, axis=1)
out = []
for i,all_idx in enumerate(shuffled_indx):
if lens[i]==0:
out.append(np.array([]))
else:
slice_idx = all_idx[:di0[i]]
out.append(rang_indx[i][slice_idx])
Approach #2 : Another way with doing much of the setup work in an efficient manner within the loop -
lens = np.array([len(i) for i in rang_indx])
di0 = np.minimum(lens, di.astype(int))
out = []
for i in range(len(lens)):
if lens[i]==0:
out.append(np.array([]))
else:
k = di0[i]
slice_idx = np.argpartition(np.random.rand(lens[i]), k-1)[:k]
out.append(rang_indx[i][slice_idx])
来源:https://stackoverflow.com/questions/46078995/speed-up-sub-array-shuffling-and-storing