问题
I have multiple arrays of the following kind:
import numpy as np
orig_arr = np.full(shape=(5,10), fill_value=1) #only an example, actual entries different
Every entry in the array above is a number to a dictionary containing further information, which is stored in an array;
toy_dict = {0:np.arange(13, 23, dtype=float), 1:np.arange(23, 33, dtype=float)}
My task is to replace the entries in the orig_arr
with the array stored in the dict (here it is the toy_dict
)
My current approach is a naive approach, but I am looking for faster approaches:
goal_arr = np.full(shape=(orig_arr.shape[0], orig_arr.shape[1], 10), fill_value=2, dtype=float)
for row in range(orig_arr.shape[0]):
for col in range(orig_arr.shape[1]):
goal_arr[row,col] = toy_dict[0] # actual replacement happens here
As you can see, I am using an intermediate step, creating a goal_arr
which has the desired shape.
My question: How can I add the third dimension in a faster way, what parts can I improve? Thanks in advance!
(Further question I have looked in: "Error: setting an array element with a sequence", Numpy append: Automatically cast an array of the wrong dimension, Append 2D array to 3D array, extending third dimension)
Edit: After mathfux' good answer, I tested his proposed code versus my code in terms of speed comparison for larger arrays (more realistic for my use case):
Imports:
import numpy as np
import time
first_dim = 50
second_dim = 20
depth_dim = 300
upper_count = 5000
toy_dict = {k:np.random.random_sample(size = depth_dim) for k in range(upper_count)}
My original version, after parameterization
start = time.time()
orig_arr = np.random.randint(0, upper_count, size=(first_dim, second_dim))
goal_arr = np.empty(shape=(orig_arr.shape[0], orig_arr.shape[1], depth_dim), dtype=float)
for row in range(orig_arr.shape[0]):
for col in range(orig_arr.shape[1]):
goal_arr[row,col] = toy_dict[orig_arr[row, col]]
end = time.time()
print(end-start)
Time: 0.008016824722290039
Now mathfux' kindly provided answer:
start = time.time()
orig_arr = np.random.randint(0, upper_count, size=(first_dim,second_dim))
goal_arr = np.empty(shape=(orig_arr.shape[0], orig_arr.shape[1], depth_dim), dtype=float)
a = np.array(list(toy_dict.values())) #do not know if it can be optimized
idx = np.indices(orig_arr.shape)
goal_arr[idx[0], idx[1]] = a[orig_arr[idx[0], idx[1]]]
end = time.time()
print(end-start)
Time: 0.015697956085205078
Interestingly, the advanced index is slower. I think this is due to the dict->list->array conversion which takes time.
Nevertheless, thank you for your answers.
Edit 2:
I ran the code with the list conversion not occurring in the second code block (but before):
Time: 0.002306699752807617
Now this supports my thesis. Since the toy_dict
will be created only once, the proposed solution is faster. Thanks.
回答1:
You need to avoid every iterable object that is not numpy array itself as well as Python level iterations. So you might like to store values of dictionary in separate array and then use fancy indexing:
goal_arr = np.empty(shape=(orig_arr.shape[0], orig_arr.shape[1], 10), dtype=float)
a = np.array(list(toy_dict.values())) #do not know if it can be optimized
idx = np.indices(orig_arr.shape)
goal_arr[idx[0], idx[1]] = a[orig_arr[idx[0], idx[1]]]
You can see here that creation of goal_arr
is must-do but I've used np.empty
instead of np.full
since it's more efficient.
Remark: this way works only if list(toy_dict.keys())
is a list of the form [0, 1, 2...]
. In other cases you need to think of how to apply a map toy_dict.keys()
-> [0, 1, ...]
on orig_arr
. I've found this task quite difficult so leaving it out of scope.
Usage
goal_arr = np.empty(shape=(orig_arr.shape[0], orig_arr.shape[1], 10), dtype=float)
toy_dict = {k:np.random.randint(10, size = 10) for k in range(9)}
orig_arr = np.random.randint(0, 8, size=(2,3))
a = np.array(list(toy_dict.values())) #do not know if it can be optimized
idx = np.indices(orig_arr.shape)
goal_arr[idx[0], idx[1]] = a[orig_arr[idx[0], idx[1]]]
Sample run:
print('orig_arr:\n', orig_arr)
print('toy_dict:\n', toy_dict)
print('goal arr:\n', goal_arr)
---------------------------------
orig_arr:
[[7 3 0]
[1 3 2]]
toy_dict:
{0: array([8, 7, 3, 4, 8, 8, 6, 6, 5, 2]), 1: array([7, 2, 4, 7, 5, 5, 6, 8, 6, 5]), 2: array([5, 3, 4, 7, 6, 8, 6, 4, 4, 7]), 3: array([9, 2, 5, 1, 1, 8, 1, 1, 7, 0]), 4: array([9, 6, 7, 2, 7, 2, 4, 4, 5, 8]), 5: array([4, 9, 5, 2, 8, 3, 9, 4, 7, 9]), 6: array([6, 0, 7, 8, 5, 4, 7, 8, 8, 2]), 7: array([6, 5, 9, 3, 6, 2, 0, 2, 3, 2]), 8: array([5, 3, 9, 3, 2, 3, 0, 8, 3, 5])}
goal arr:
[[[6. 5. 9. 3. 6. 2. 0. 2. 3. 2.]
[9. 2. 5. 1. 1. 8. 1. 1. 7. 0.]
[8. 7. 3. 4. 8. 8. 6. 6. 5. 2.]]
[[7. 2. 4. 7. 5. 5. 6. 8. 6. 5.]
[9. 2. 5. 1. 1. 8. 1. 1. 7. 0.]
[5. 3. 4. 7. 6. 8. 6. 4. 4. 7.]]]
You might also find this excellent tutorial about advanced indexing helpful.
来源:https://stackoverflow.com/questions/63774297/time-efficient-way-to-replace-numpy-entries