MATLAB: compute mean of each 1-minute interval of a time-series

前端未结

关注

 6  1275

I have a bunch of times-series each described by two components, a timestamp vector (in seconds), and a vector of values measured. The time vector is non-uniform (i.e. sampl

相关标签:

6条回答

广开言路

2020-12-09 20:43

The same answer as above but with the parametric interval (window_size). Issue with the vector lengths solved as well.

window_size = 60; % but it can be any value 60 5 0.1, which wasn't described above

t = sort((100:999)' + 3*rand(900,1));     % non-uniform time
x = 5*rand(900,1) + 10;                   % x(i) is the value at time t(i)

int = 1 + floor((t - t(1))/window_size);
tt = ( floor(t(1)):window_size:ceil(t(end)) )';



% mean val and std dev of the accelerations at speed
mu = accumarray(int,x,[],@mean);
sd = accumarray(int,x,[],@std);

%resolving some issue with sizes (for i.e. window_size = 1 in stead of 60)
while ( sum(size(tt) > size(mu)) > 0 ) 
  tt(end)=[]; 
end

errorbar(tt,mu,sd);

0 讨论(0)

渐次进展

2020-12-09 20:55
Here's a way that uses binary search. It is 6-10x faster for 9900 elements and about 64x times faster for 99900 elements. It was hard to get reliable times using only 900 elements so I'm not sure which is faster at that size. It uses almost no extra memory if you consider making tx directly from the generated data. Other than that it just has four extra float variables (prevind, first, mid, and last).
```
% Sort the data so that we can use binary search (takes O(N logN) time complexity).
tx = sortrows([t x]);

prevind = 1;

for i=1:N
    % First do a binary search to find the end of this section
    first = prevind;
    last = length(tx);
    while first ~= last
        mid = floor((first+last)/2);
        if tt(i+1) > tx(mid,1)
            first = mid+1;
        else
            last = mid;
        end;
    end;
    mu(i) = mean( tx(prevind:last-1,2) );
    sd(i) = std( tx(prevind:last-1,2) );
    prevind = last;
end;
```
It uses all of the variables that you had originally. I hope that it suits your needs. It is faster because it takes O(log N) to find the indices with binary search, but O(N) to find them the way you were doing it.
0 讨论(0)
发布评论:

提交评论
- 加载中...

没有蜡笔的小新

2020-12-09 20:57

You could try and create a cell array and apply mean and std via cellfun. It's ~10% slower than your solution for 900 entries, but ~10x faster for 90000 entries.

[t,sortIdx]=sort(t); %# we only need to sort in case t is not monotonously increasing
x = x(sortIdx);

tIdx = floor(t/60); %# convert seconds to minutes - can also convert to 5 mins by dividing by 300
tIdx = tIdx - min(tIdx) + 1; %# tIdx now is a vector of indices - i.e. it starts at 1, and should go like your iteration variable.

%# the next few commands are to count how many 1's 2's 3's etc are in tIdx
dt = [tIdx(2:end)-tIdx(1:end-1);1]; 
stepIdx = [0;find(dt>0)];
nIdx = stepIdx(2:end) - stepIdx(1:end-1); %# number of times each index appears

%# convert to cell array
xCell = mat2cell(x,nIdx,1);

%# use cellfun to calculate the mean and sd
mu(tIdx(stepIdx+1)) = cellfun(@mean,xCell); %# the indexing is like that since there may be missing steps
sd(tIdx(stepIdx+1)) = cellfun(@mean,xCell);

Note: my solution does not give the exact same results as yours, since you skip a few time values at the end (1:60:90 is [1,61]), and since the start of the interval is not exactly the same.

0 讨论(0)

梦如初夏

2020-12-09 20:59
You can compute indices all at once using bsxfun:
```
indices = ( bsxfun(@ge, t, tt(1:end-1)') & bsxfun(@lt, t, tt(2:end)') );
```
This is faster than looping but requires storing them all at once (time vs space tradeoff)..
0 讨论(0)
发布评论:

提交评论
- 加载中...

离开以前

2020-12-09 21:04

Disclaimer: I worked this out on paper, but haven't yet had the opportunity to check it "in silico"...

You may be able to avoid loops or using cell arrays by doing some tricky cumulative sums, indexing, and calculating the means and standard deviations yourself. Here's some code that I believe will work, although I am unsure how it stacks up speed-wise to the other solutions:

[t,sortIndex] = sort(t);  %# Sort the time points
x = x(sortIndex);         %# Sort the data values
interval = 60;            %# Interval size, in seconds

intervalIndex = floor((t-t(1))./interval)+1;  %# Collect t into intervals
nIntervals = max(intervalIndex);              %# The number of intervals
mu = zeros(nIntervals,1);                     %# Preallocate mu
sd = zeros(nIntervals,1);                     %# Preallocate sd

sumIndex = [find(diff(intervalIndex)) ...
            numel(intervalIndex)];  %# Find indices of the interval ends
n = diff([0 sumIndex]);             %# Number of samples per interval
xSum = cumsum(x);                   %# Cumulative sum of x
xSum = diff([0 xSum(sumIndex)]);    %# Sum per interval
xxSum = cumsum(x.^2);               %# Cumulative sum of x^2
xxSum = diff([0 xxSum(sumIndex)]);  %# Squared sum per interval

intervalIndex = intervalIndex(sumIndex);  %# Find index into mu and sd
mu(intervalIndex) = xSum./n;                             %# Compute mean
sd(intervalIndex) = sqrt((xxSum-xSum.*xSum./n)./(n-1));  %# Compute std dev

The above computes the standard deviation using the simplification of the formula found on this Wikipedia page.

0 讨论(0)

青春惊慌失措

2020-12-09 21:05
The only logical solution seems to be...

Ok. I find it funny that to me there is only one logical solution, but many others find other solutions. Regardless, the solution does seem simple. Given the vectors x and t, and a set of equally spaced break points tt,
```
t = sort((100:999)' + 3*rand(900,1));     % non-uniform time
x = 5*rand(900,1) + 10;             % x(i) is the value at time t(i)

tt = ( floor(t(1)):1*60:ceil(t(end)) )';
```
(Note that I sorted t above.)

I would do this in three fully vectorized lines of code. First, if the breaks were arbitrary and potentially unequal in spacing, I would use histc to determine which intervals the data series falls in. Given they are uniform, just do this:
```
int = 1 + floor((t - t(1))/60);
```
Again, if the elements of t were not known to be sorted, I would have used min(t) instead of t(1). Having done that, use accumarray to reduce the results into a mean and standard deviation.
```
mu = accumarray(int,x,[],@mean);
sd = accumarray(int,x,[],@std);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...