Time series aggregation efficiency

前端 未结 5 903
傲寒
傲寒 2020-12-20 15:46

I commonly need to summarize a time series with irregular timing with a given aggregation function (i.e., sum, average, etc.). However, the current solution that I have seem

5条回答
  •  佛祖请我去吃肉
    2020-12-20 16:30

    Method #1

    You can create the mask corresponding to grIdx across all groups in one go with bsxfun(@eq,..). Now, for collapseFn as @sum, you can bring in matrix-multiplication and thus have a completely vectorized approach, like so -

    M = squeeze(all(bsxfun(@eq,groupIndex,permute(groups,[3 2 1])),2))
    aggArray = M.'*array
    

    For collapseFn as @mean, you need to do a bit more work, as shown here -

    M = squeeze(all(bsxfun(@eq,groupIndex,permute(groups,[3 2 1])),2))
    aggArray = bsxfun(@rdivide,M,sum(M,1)).'*array
    

    Method #2

    In case you are working with a generic collapseFn, you can use the 2D mask M created with the previous method to index into the rows of array, thus changing the complexity from O(n^2) to O(n). Some quick tests suggest this to give appreciable speedup over the original loopy code. Here's the implementation -

    n = size(groups,1);
    M = squeeze(all(bsxfun(@eq,groupIndex,permute(groups,[3 2 1])),2));
    out = zeros(n,size(array,2));
    for iGr = 1:n
        out(iGr,:) = collapseFn(array(M(:,iGr),:),1);
    end
    

    Please note that the 1 in collapseFn(array(M(:,iGr),:),1) denotes the dimension along which collapseFn would be applied, so that 1 is essential there.


    Bonus

    By its name groupIndex seems like would hold integer values, which could be abused to have a more efficient M creation by considering each row of groupIndex as an indexing tuple and thus converting each row of groupIndex into a scalar and finally get a 1D array version of groupIndex. This must be more efficient as the datasize would be 0(n) now. This M could be fed to all the approaches listed in this post. So, we would have M like so -

    dims = max(groupIndex,[],1);
    agg_dims = cumprod([1 dims(end:-1:2)]);
    [~,~,idx] = unique(groupIndex*agg_dims(end:-1:1).'); %//'
    
    m = size(groupIndex,1);
    M = false(m,max(idx));
    M((idx-1)*m + [1:m]') = 1;
    

提交回复
热议问题