I have 3 numpy arrays and need to form the cartesian product between them. Dimensions of the arrays are not fixed, so they can take different values, one example could be A=
The following produces your expected result without relying on an intermediate three times the size of the result. It uses broadcasting.
Please note that almost any NumPy operation is broadcastable like this, so in practice there is probably no need for an explicit cartesian product:
#shared dimensions:
sh = a.shape[1:]
aba = (a[:, None, None] + b[None, :, None] - a[None, None, :]).reshape(-1, *sh)
aba
#array([[ 0. , 0.03],
# [-1. , 0.16],
# [ 1. , -0.1 ],
# [ 0. , 0.03]])
You may consider leaving out the reshape. That would allow you to address the rows in the result by combined index. If your component ID's are just 0,1,2,... like in your example this would be the same as the combined ID. For example aba[1,0,0] would correspond to the row obtained as second row of a + first row of b - first row of a.
Broadcasting: When for example adding two arrays their shapes do not have to be identical, only compatible because of broadcasting. Broadcasting is in a sense a generalization of adding scalars to arrays:
[[2], [[7], [[2],
7 + [3], equiv to [7], + [3],
[4]] [7]] [4]]
Broadcasting:
[[4], [[1, 2, 3], [[4, 4, 4],
[[1, 2, 3]] + [5], equiv to [1, 2, 3], + [5, 5, 5],
[6]] [1, 2, 3]] [6, 6, 6]]
For this to work each dimension of each operand must be either 1 or equal to the corresponding dimension in each other operand unless it is 1. If an operand has fewer dimensions than the others its shape is padded with ones on the left. Note that the equiv arrays shown in the illustration are not explicitly created.
In that case I don't see how you can possibly avoid using storage, so h5py or something like that it is.
This is just a matter of slicing:
a_no_id = a[:, 1:]
etc. Note that, unlike Python lists, NumPy arrays when sliced do not return a copy but a view. Therefore efficiency (memory or runtime) is not an issue here.