Time series aggregation efficiency

前端 未结 5 897
傲寒
傲寒 2020-12-20 15:46

I commonly need to summarize a time series with irregular timing with a given aggregation function (i.e., sum, average, etc.). However, the current solution that I have seem

5条回答
  •  粉色の甜心
    2020-12-20 16:22

    Doing away with the inner loop, i.e.

    function aggArray = aggregate(array, groupIndex, collapseFn)
    
    groups = unique(groupIndex, 'rows');
    aggArray = nan(size(groups, 1), size(array, 2));
    
    for iGr = 1:size(groups,1)
        grIdx = all(groupIndex == repmat(groups(iGr,:), [size(groupIndex,1), 1]), 2);
       aggArray(iGr,:) = collapseFn(array(grIdx,:));
    end
    

    and calling the collapse function with a dimension parameter

    res=aggregate(a, b, @(x)sum(x,1));
    

    gives some speedup (3x on my machine) already and avoids the errors e.g. sum or mean produce, when they encounter a single row of data without a dimension parameter and then collapse across columns rather than labels.

    If you had just one group label vector, i.e. same group labels for all columns of data, you could speed further up:

    function aggArray = aggregate(array, groupIndex, collapseFn)
    
    ng=max(groupIndex);
    aggArray = nan(ng, size(array, 2));
    
    for iGr = 1:ng
        aggArray(iGr,:) = collapseFn(array(groupIndex==iGr,:));
    end
    

    The latter functions gives identical results for your example, with a 6x speedup, but cannot handle different group labels per data column.

    Assuming a 2D test case for the group index (provided here as well with 10 different columns for groupIndex:

    a = rand(20006,10);
    B=[]; % make random length periods for each of the 10 signals
    for i=1:size(a,2)
          n0=randi(10);
          b=transpose([ones(1,n0) 2*ones(1,11-n0) sort(repmat((3:4001), [1 5]))]);
          B=[B b];
    end
    tic; erg0=aggregate(a, B, @sum); toc % original method 
    tic; erg1=aggregate2(a, B, @(x)sum(x,1)); toc %just remove the inner loop
    tic; erg2=aggregate3(a, B, @(x)sum(x,1)); toc %use function below
    

    Elapsed time is 2.646297 seconds. Elapsed time is 1.214365 seconds. Elapsed time is 0.039678 seconds (!!!!).

    function aggArray = aggregate3(array, groupIndex, collapseFn)
    
    [groups,ix1,jx] = unique(groupIndex, 'rows','first');
    [groups,ix2,jx] = unique(groupIndex, 'rows','last');
    
    ng=size(groups,1);
    aggArray = nan(ng, size(array, 2));
    
    for iGr = 1:ng
        aggArray(iGr,:) = collapseFn(array(ix1(iGr):ix2(iGr),:));
    end
    

    I think this is as fast as it gets without using MEX. Thanks to the suggestion of Matthew Gunn! Profiling shows that 'unique' is really cheap here and getting out just the first and last index of the repeating rows in groupIndex speeds things up considerably. I get 88x speedup with this iteration of the aggregation.

提交回复
热议问题