Faster version of find for sorted vectors (MATLAB)

前端 未结 5 1173
心在旅途
心在旅途 2020-11-27 17:51

I have code of the following kind in MATLAB:

indices = find([1 2 2 3 3 3 4 5 6 7 7] == 3)

This returns 4,5,6 - the indices of elements in t

5条回答
  •  广开言路
    2020-11-27 18:24

    I needed a function like this. Thanks for the post @Daniel!

    I worked a little with it because I needed to find several indexes in the same array. I wanted to avoid the overhead of arrayfun (or the like) or calling the function multiple times. So you can pass a bunch of values in range and you will get the indexes in the array.

    function idx = findInSorted(x,range)
    % Author Dídac Rodríguez Arbonès (May 2018)
    % Based on Daniel Roeske's solution:
    %   Daniel Roeske 
    %   https://github.com/danielroeske/danielsmatlabtools/blob/master/matlab/data/findinsorted.m
    
        range = sort(range);
        idx = nan(size(range));
        for i=1:numel(range)
            idx(i) = aux(x, range(i));
        end
    end
    
    function b = aux(x, lim)
        a=1;
        b=numel(x);
        if lim<=x(1)
           b=a;
        end
        if lim>=x(end)
           a=b;
        end
    
        while (a+1

    I guess you can use a parfor or arrayfun instead. I have not tested myself at what size of range it pays off, though.

    Another possible improvement would be to use the previous found indexes (if range is sorted) to decrease the search space. I am skeptical of its potential to save CPU because of the O(log n) runtime.


    The final function ended up running slightly faster. I used @randomatlabuser 's framework for that:

    N = 5e6;    % length of vector
    p = 0.99;    % probability
    KK = 100;    % number of instances
    rntm1 = zeros(KK, 1);    % runtime with ismember
    rntm2 = zeros(KK, 1);    % runtime with ismembc
    rntm3 = zeros(KK, 1);    % runtime with Daniel's function
    for kk = 1:KK
        x = cumsum(rand(N, 1) > p);
        searchfor = x(ceil(4*N/5));
    
        tic
        range = sort(searchfor);
        idx = nan(size(range));
        for i=1:numel(range)
            idx(i) = aux(x, range(i));
        end
    
        rntm1(kk) = toc;
    
        tic
        a=1;
        b=numel(x);
        c=1;
        d=numel(x);
        while (a+1=x(end)
       a=b;
    end
    
    while (a+1

    It is not a big improvement, but it helps because I need to run several thousand searches.

    % Mean of running time
    mean([rntm1 rntm2])
    % 9.9624e-05  5.6303e-05
    
    % Percentiles of running time
    prctile([rntm1 rntm2], [0 25 50 75 100])
    % 3.0435e-05  1.0524e-05
    % 3.4133e-05  1.2231e-05
    % 3.7262e-05  1.3369e-05
    % 3.9111e-05  1.4507e-05
    %  0.0027426   0.0020301
    

    I hope this can help somebody.


    EDIT

    If there is a significant chance of having exact matches, it pays off to use the very fast built-in ismember before calling the function:

    [found, idx] = ismember(range, x);
    idx(~found) = arrayfun(@(r) aux(x, r), range(~found));
    

提交回复
热议问题