How can I hot one encode in Matlab? [duplicate]

回眸只為那壹抹淺笑 提交于 2019-12-17 16:11:33

问题


Often you are given a vector of integer values representing your labels (aka classes), for example

[2; 1; 3; 3; 2]

and you would like to hot one encode this vector, such that each value is represented by a 1 in the column indicated by the value in each row of the labels vector, for example

[0 1 0;
 1 0 0;
 0 0 1;
 0 0 1;
 0 1 0]

回答1:


For speed and memory savings, you can use bsxfun combined with eq to accomplish the same thing. While your eye solution may work, your memory usage grows quadratically with the number of unique values in X.

Y = bsxfun(@eq, X(:), 1:max(X));

Or as an anonymous function if you prefer:

hotone = @(X)bsxfun(@eq, X(:), 1:max(X));

Or if you're on Octave (or MATLAB version R2016b and later) , you can take advantage of automatic broadcasting and simply do the following as suggested by @Tasos.

Y = X == 1:max(X);

Benchmark

Here is a quick benchmark showing the performance of the various answers with varying number of elements on X and varying number of unique values in X.

function benchit()

    nUnique = round(linspace(10, 1000, 10));
    nElements = round(linspace(10, 1000, 12));

    times1 = zeros(numel(nUnique), numel(nElements));
    times2 = zeros(numel(nUnique), numel(nElements));
    times3 = zeros(numel(nUnique), numel(nElements));
    times4 = zeros(numel(nUnique), numel(nElements));
    times5 = zeros(numel(nUnique), numel(nElements));

    for m = 1:numel(nUnique)
        for n = 1:numel(nElements)
            X = randi(nUnique(m), nElements(n), 1);
            times1(m,n) = timeit(@()bsxfunApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times2(m,n) = timeit(@()eyeApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times3(m,n) = timeit(@()sub2indApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times4(m,n) = timeit(@()sparseApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times5(m,n) = timeit(@()sparseFullApproach(X));
        end
    end

    colors = get(0, 'defaultaxescolororder');

    figure;

    surf(nElements, nUnique, times1 * 1000, 'FaceColor', colors(1,:), 'FaceAlpha', 0.5);
    hold on
    surf(nElements, nUnique, times2 * 1000, 'FaceColor', colors(2,:), 'FaceAlpha', 0.5);
    surf(nElements, nUnique, times3 * 1000, 'FaceColor', colors(3,:), 'FaceAlpha', 0.5);
    surf(nElements, nUnique, times4 * 1000, 'FaceColor', colors(4,:), 'FaceAlpha', 0.5);
    surf(nElements, nUnique, times5 * 1000, 'FaceColor', colors(5,:), 'FaceAlpha', 0.5);

    view([46.1000   34.8000])

    grid on
    xlabel('Elements')
    ylabel('Unique Values')
    zlabel('Execution Time (ms)')

    legend({'bsxfun', 'eye', 'sub2ind', 'sparse', 'full(sparse)'}, 'Location', 'Northwest')
end

function Y = bsxfunApproach(X)
    Y = bsxfun(@eq, X(:), 1:max(X));
end

function Y = eyeApproach(X)
    tmp = eye(max(X));
    Y = tmp(X, :);
end

function Y = sub2indApproach(X)
    LinearIndices = sub2ind([length(X),max(X)], [1:length(X)]', X);
    Y = zeros(length(X), max(X));
    Y(LinearIndices) = 1;
end

function Y = sparseApproach(X)
    Y = sparse(1:numel(X), X,1);
end

function Y = sparseFullApproach(X)
    Y = full(sparse(1:numel(X), X,1));
end

Results

If you need a non-sparse output bsxfun performs the best, but if you can use a sparse matrix (without conversion to a full matrix), then that is the fastest and most memory efficient option.




回答2:


You can use the identity matrix and index into it using the input/labels vector, for example if the labels vector X is some random integer vector

X = randi(3,5,1)

ans =

   2
   1
   2
   3
   3

then, the following will hot one encode X

eye(max(X))(X,:)

which can be conveniently defined as a function using

hotone = @(v) eye(max(v))(v,:)

EDIT:

Although the solution above works in Octave, you have you modify it for Matlab as follows

I = eye(max(X));
I(X,:)



回答3:


I think this is fast specially when matrix dimension grows:

Y = sparse(1:numel(X), X,1);

or

Y = full(sparse(1:numel(X), X,1));



回答4:


Just posting the sub2ind solution too to satisfy your curiosity :)
But I like your solution better :p

>> X = [2,1,2,3,3]'
>> LinearIndices = sub2ind([length(X),3], [1:length(X)]', X);
>> tmp = zeros(length(X), 3); 
>> tmp(LinearIndices) = 1
tmp =

     0     1     0
     1     0     0
     0     1     0
     0     0     1
     0     0     1



回答5:


Just in case someone is looking for the 2D case (as I was):

X = [2 1; ...
     3 3; ...
     2 4]
Y = zeros(3,2,4)
for i = 1:4
    Y(:,:,i) = ind2sub(X,X==i)
end

gives a one-hot encoded matrix along the 3rd dimension.



来源:https://stackoverflow.com/questions/38947948/how-can-i-hot-one-encode-in-matlab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!