MATLAB: compute mean of each 1-minute interval of a time-series

前端 未结 6 1268
小鲜肉
小鲜肉 2020-12-09 20:16

I have a bunch of times-series each described by two components, a timestamp vector (in seconds), and a vector of values measured. The time vector is non-uniform (i.e. sampl

相关标签:
6条回答
  • 2020-12-09 20:43

    The same answer as above but with the parametric interval (window_size). Issue with the vector lengths solved as well.

    window_size = 60; % but it can be any value 60 5 0.1, which wasn't described above
    
    t = sort((100:999)' + 3*rand(900,1));     % non-uniform time
    x = 5*rand(900,1) + 10;                   % x(i) is the value at time t(i)
    
    int = 1 + floor((t - t(1))/window_size);
    tt = ( floor(t(1)):window_size:ceil(t(end)) )';
    
    
    
    % mean val and std dev of the accelerations at speed
    mu = accumarray(int,x,[],@mean);
    sd = accumarray(int,x,[],@std);
    
    %resolving some issue with sizes (for i.e. window_size = 1 in stead of 60)
    while ( sum(size(tt) > size(mu)) > 0 ) 
      tt(end)=[]; 
    end
    
    errorbar(tt,mu,sd);
    
    0 讨论(0)
  • 2020-12-09 20:55

    Here's a way that uses binary search. It is 6-10x faster for 9900 elements and about 64x times faster for 99900 elements. It was hard to get reliable times using only 900 elements so I'm not sure which is faster at that size. It uses almost no extra memory if you consider making tx directly from the generated data. Other than that it just has four extra float variables (prevind, first, mid, and last).

    % Sort the data so that we can use binary search (takes O(N logN) time complexity).
    tx = sortrows([t x]);
    
    prevind = 1;
    
    for i=1:N
        % First do a binary search to find the end of this section
        first = prevind;
        last = length(tx);
        while first ~= last
            mid = floor((first+last)/2);
            if tt(i+1) > tx(mid,1)
                first = mid+1;
            else
                last = mid;
            end;
        end;
        mu(i) = mean( tx(prevind:last-1,2) );
        sd(i) = std( tx(prevind:last-1,2) );
        prevind = last;
    end;
    

    It uses all of the variables that you had originally. I hope that it suits your needs. It is faster because it takes O(log N) to find the indices with binary search, but O(N) to find them the way you were doing it.

    0 讨论(0)
  • 2020-12-09 20:57

    You could try and create a cell array and apply mean and std via cellfun. It's ~10% slower than your solution for 900 entries, but ~10x faster for 90000 entries.

    [t,sortIdx]=sort(t); %# we only need to sort in case t is not monotonously increasing
    x = x(sortIdx);
    
    tIdx = floor(t/60); %# convert seconds to minutes - can also convert to 5 mins by dividing by 300
    tIdx = tIdx - min(tIdx) + 1; %# tIdx now is a vector of indices - i.e. it starts at 1, and should go like your iteration variable.
    
    %# the next few commands are to count how many 1's 2's 3's etc are in tIdx
    dt = [tIdx(2:end)-tIdx(1:end-1);1]; 
    stepIdx = [0;find(dt>0)];
    nIdx = stepIdx(2:end) - stepIdx(1:end-1); %# number of times each index appears
    
    %# convert to cell array
    xCell = mat2cell(x,nIdx,1);
    
    %# use cellfun to calculate the mean and sd
    mu(tIdx(stepIdx+1)) = cellfun(@mean,xCell); %# the indexing is like that since there may be missing steps
    sd(tIdx(stepIdx+1)) = cellfun(@mean,xCell);
    

    Note: my solution does not give the exact same results as yours, since you skip a few time values at the end (1:60:90 is [1,61]), and since the start of the interval is not exactly the same.

    0 讨论(0)
  • 2020-12-09 20:59

    You can compute indices all at once using bsxfun:

    indices = ( bsxfun(@ge, t, tt(1:end-1)') & bsxfun(@lt, t, tt(2:end)') );
    

    This is faster than looping but requires storing them all at once (time vs space tradeoff)..

    0 讨论(0)
  • 2020-12-09 21:04

    Disclaimer: I worked this out on paper, but haven't yet had the opportunity to check it "in silico"...

    You may be able to avoid loops or using cell arrays by doing some tricky cumulative sums, indexing, and calculating the means and standard deviations yourself. Here's some code that I believe will work, although I am unsure how it stacks up speed-wise to the other solutions:

    [t,sortIndex] = sort(t);  %# Sort the time points
    x = x(sortIndex);         %# Sort the data values
    interval = 60;            %# Interval size, in seconds
    
    intervalIndex = floor((t-t(1))./interval)+1;  %# Collect t into intervals
    nIntervals = max(intervalIndex);              %# The number of intervals
    mu = zeros(nIntervals,1);                     %# Preallocate mu
    sd = zeros(nIntervals,1);                     %# Preallocate sd
    
    sumIndex = [find(diff(intervalIndex)) ...
                numel(intervalIndex)];  %# Find indices of the interval ends
    n = diff([0 sumIndex]);             %# Number of samples per interval
    xSum = cumsum(x);                   %# Cumulative sum of x
    xSum = diff([0 xSum(sumIndex)]);    %# Sum per interval
    xxSum = cumsum(x.^2);               %# Cumulative sum of x^2
    xxSum = diff([0 xxSum(sumIndex)]);  %# Squared sum per interval
    
    intervalIndex = intervalIndex(sumIndex);  %# Find index into mu and sd
    mu(intervalIndex) = xSum./n;                             %# Compute mean
    sd(intervalIndex) = sqrt((xxSum-xSum.*xSum./n)./(n-1));  %# Compute std dev
    

    The above computes the standard deviation using the simplification of the formula found on this Wikipedia page.

    0 讨论(0)
  • 2020-12-09 21:05

    The only logical solution seems to be...

    Ok. I find it funny that to me there is only one logical solution, but many others find other solutions. Regardless, the solution does seem simple. Given the vectors x and t, and a set of equally spaced break points tt,

    t = sort((100:999)' + 3*rand(900,1));     % non-uniform time
    x = 5*rand(900,1) + 10;             % x(i) is the value at time t(i)
    
    tt = ( floor(t(1)):1*60:ceil(t(end)) )';
    

    (Note that I sorted t above.)

    I would do this in three fully vectorized lines of code. First, if the breaks were arbitrary and potentially unequal in spacing, I would use histc to determine which intervals the data series falls in. Given they are uniform, just do this:

    int = 1 + floor((t - t(1))/60);
    

    Again, if the elements of t were not known to be sorted, I would have used min(t) instead of t(1). Having done that, use accumarray to reduce the results into a mean and standard deviation.

    mu = accumarray(int,x,[],@mean);
    sd = accumarray(int,x,[],@std);
    
    0 讨论(0)
提交回复
热议问题