What is the fastest way to count elements in an array?

前端 未结 2 749
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-03 12:47

In my models, one of the most repeated tasks to be done is counting the number of each element within an array. The counting is from a closed set, so I know there are

2条回答
  •  余生分开走
    2020-12-03 13:01

    We know that that the input vector always contains integers, so why not use this to "squeeze" a bit more performance out of the algorithm?

    I've been experimenting with some optimizations of the the two best binning methods suggested by the OP, and this is what I came up with:

    • The number of unique values (X in the question, or n in the example) should be explicitly converted to an (unsigned) integer type.
    • It's faster to compute an extra bin and then discard it, than to "only process" valid values (see the accumi_new function below).

    This function takes about 30sec to run on my machine. I'm using MATLAB R2016a.


    function q38941694
    datestr(now)
    N = 25;
    func_times = zeros(N,4);
    for n = 1:N
        func_times(n,:) = timing_hist(2^n,500);
    end
    % Plotting:
    figure('Position',[572 362 758 608]);
    hP = plot(1:n,log10(func_times.*1000),'-o','MarkerEdgeColor','k','LineWidth',2);
    xlabel('Log_2(Array size)'); ylabel('Log_{10}(Execution time) (ms)')
    legend({'histcounts (double)','histcounts (uint)','accumarray (old)',...
      'accumarray (new)'},'FontSize',12,'Location','NorthWest')
    grid on; grid minor;
    set(hP([2,4]),'Marker','s'); set(gca,'Fontsize',16);
    datestr(now)
    end
    
    function out = timing_hist(N,n)
    % Convert n into an appropriate integer class:
    if n < intmax('uint8')
      classname = 'uint8';
      n = uint8(n);
    elseif n < intmax('uint16')
      classname = 'uint16';
      n = uint16(n);
    elseif n < intmax('uint32')
      classname = 'uint32';
      n = uint32(n);
    else % n < intmax('uint64')  
      classname = 'uint64';
      n = uint64(n);
    end
    % Generate an input:
    M = randi([0 n],N,1,classname);
    % Time different options:
    warning off 'MATLAB:timeit:HighOverhead'
    func_times = {'histcounts (double)','histcounts (uint)','accumarray (old)',...
      'accumarray (new)';
        timeit(@() histci(double(M),double(n))),...
        timeit(@() histci(M,n)),...
        timeit(@() accumi(M)),...
        timeit(@() accumi_new(M))
        };
    out = cell2mat(func_times(2,:));
    end
    
    function spp = histci(M,n)
      spp = histcounts(M,1:n+1);
    end
    
    function spp = accumi(M)
      spp = accumarray(M(M>0),1);
    end
    
    function spp = accumi_new(M)
      spp = accumarray(M+1,1);
      spp = spp(2:end);
    end
    

提交回复
热议问题