I have np matrix and I want to convert it to a 3d array with one hot encoding of the elements as third dimension. Is there a way to do with without looping over each row eg
Edit: I just realized that my answer is covered already in the accepted answer. Unfortunately, as an unregistered user, I cannot delete it any more.
As an addendum to the accepted answer: If you have a very small number of classes to encode and if you can accept np.bool arrays as output, I found the following to be even slightly faster:
def onehot_initialization_v3(a):
ncols = a.max() + 1
labels_one_hot = (a.ravel()[np.newaxis] == np.arange(ncols)[:, np.newaxis]).T
labels_one_hot.shape = a.shape + (ncols,)
return labels_one_hot
Timings (for 10 classes):
a = np.random.randint(0,10,(100,100))
assert np.all(onehot_initialization_v2(a) == onehot_initialization_v3(a))
%timeit onehot_initialization_v2(a)
%timeit onehot_initialization_v3(a)
# 102 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 79.3 µs ± 815 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
This changes, however, if the number of classes increases (now 100 classes):
a = np.random.randint(0,100,(100,100))
assert np.all(onehot_initialization_v2(a) == one_hot_initialization_v3(a))
%timeit onehot_initialization_v2(a)
%timeit onehot_initialization_v3(a)
# 132 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 639 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So, depending on your problem, either might be the faster version.