问题
1° Why is the following code returning False ? I thought that np.broadcast_to
will increase the dimensin of the array the same way np.repeat
does.
2° Can i reproduce the result given by np.repeat
with np.broadcast_to
?
import numpy as np
n = 100
d = 10
A = np.random.uniform(size=(n,d))
np.all(np.broadcast_to(A.reshape(n,1,d),(n,d-1,d))==np.repeat(A,d-1).reshape(n,d-1,d))
3° More generaly, for a given aray A
of shape (n,d), how can i reproduce np.repeat(A,k).reshape((n,k,d))
with np.broadcast_to
?
回答1:
There are a number of things going on here, and it would be easier to work with an array containing a small number of flag values to identify the issues. Here is a sample that is easy to work with:
arr = np.array([[1, 2, 1, 2], [3, 4, 3, 4]]) # n = 2, d = 4
Let's see what broadcast_to does:
>>> A.reshape(n, 1, d)
array([[[1, 2, 1, 2]],
[[3, 4, 3, 4]]])
>>> arr.broadcast_to(_, (n, d - 1, d))
array([[[1, 2, 1, 2],
[1, 2, 1, 2],
[1, 2, 1, 2]],
[[3, 4, 3, 4],
[3, 4, 3, 4],
[3, 4, 3, 4]]])
You could achieve functionally similar arrays with tile, stack and concatenate. The main difference would be that broadcast_to
does not copy the data in the new dimension. Instead, it adjusts the stride so that the array appears to have the right size (leading to unexpected behavior if you're not careful, e.g. with writing to the buffer):
np.tile(arr.reshape(n, 1, d), (1, d - 1, 1))
np.stack([arr] * (d - 1), axis=1)
np.concatenate([arr.reshape(n, 1, d)] * (d - 1), axis=1)
Now let's take a look at repeat:
>>> np.repeat(arr, d - 1)
array([1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4, 4])
This is the flattened array, each element of which is repeated d - 1
times. Clearly the reshape here is not going to be the same as the broadcasted/tiled version:
>>> _.reshape(n, d - 1, d)
array([[[1, 1, 1, 2],
[2, 2, 1, 1],
[1, 2, 2, 2]],
[[3, 3, 3, 4],
[4, 4, 3, 3],
[3, 4, 4, 4]]])
Clearly, the element-wise repetition is not identical to the broadcast. However, if you were to use the axis
keyword properly, you could in fact get the right result:
>>> np.repeat(arr.reshape(n, 1, d), d - 1, axis=1)
array([[[1, 2, 1, 2],
[1, 2, 1, 2],
[1, 2, 1, 2]],
[[3, 4, 3, 4],
[3, 4, 3, 4],
[3, 4, 3, 4]]])
If you wanted to go the other way, and have the data wrapping around shorter rows, you could just re-interpret the dimensions using a combination of transpose
and reshape
:
>>> np.broadcast_to(arr.reshape(n, 1, d), (n, d - 1, d)).transpose([0, 2, 1]).reshape(n, d - 1, d)
array([[[1, 1, 1, 2],
[2, 2, 1, 1],
[1, 2, 2, 2]],
[[3, 3, 3, 4],
[4, 4, 3, 3],
[3, 4, 4, 4]]])
Here is a walkthrough of the transformation:
>>> arr.reshape(n, 1, d)
array([[[1, 2, 1, 2]],
[[3, 4, 3, 4]]])
>>> np.broadcast_to(_, (n, d - 1, d))
array([[[1, 2, 1, 2],
[1, 2, 1, 2],
[1, 2, 1, 2]],
[[3, 4, 3, 4],
[3, 4, 3, 4],
[3, 4, 3, 4]]])
>>> _.transpose(0, 2, 1)
array([[[1, 1, 1],
[2, 2, 2],
[1, 1, 1],
[2, 2, 2]],
[[3, 3, 3],
[4, 4, 4],
[3, 3, 3],
[4, 4, 4]]])
>>> _.reshape(n, d - 1, d)
array([[[1, 1, 1, 2],
[2, 2, 1, 1],
[1, 2, 2, 2]],
[[3, 3, 3, 4],
[4, 4, 3, 3],
[3, 4, 4, 4]]])
回答2:
np.broadcast_to
gives you an array-wise repeat while np.repeat
gives you an element-wise repeat behavior, see the examples in the docs [1] and [2]. To achieve an equal output in this case, you could change reshaping as follows:
import numpy as np
n = 100
d = 10
A = np.random.uniform(size=(n,d))
A_bc = np.broadcast_to(A.reshape(n*d, 1), (n*d, d-1)).reshape(n, d-1, d)
A_rp = np.repeat(A, d-1).reshape(n, d-1, d)
np.all(A_rp == A_bc)
# True
Note: although timeit
indicates that the broadcast_to
option is slightly faster, I'm not sure if it is actually more memory-efficient.
来源:https://stackoverflow.com/questions/59667367/numpy-difference-between-np-repeat-and-np-broadcast-to