问题
I have an array of dates and i would like to discard any dates that don't have at least one another date in a specific time interval, for example 5 minutes. I need to find a smart way to do it, as loops take forever with a larger dataset.
input data:
2009 07 07 16:01:30
2009 07 07 16:04:06
2009 07 07 16:05:00
2009 07 07 16:12:00
2009 07 07 16:19:43
2009 07 07 16:24:00
results:
2009 07 07 16:01:30
2009 07 07 16:04:06
2009 07 07 16:05:00
2009 07 07 16:19:43
2009 07 07 16:24:00
The value 2009 07 07 16:12:00 was discarded because it was more than 5 minutes away from any other timestamp.
Thanks, Cristi
Secondary issue:
Both Dan and nkjt suggested an implementation that worked, thanks! What if the dates are part of 2 groups: A or B and i want to find if there exist a date from group A that has a corresponding date in group B that is within a number of seconds/minutes apart? if not just remove the date from group A..
回答1:
You can use diff
. You'll need to use datenum
to convert your data into a vector of values. In MATLAB datenums, "1" is a single day, so you can define a datenum step in terms of a time unit divided by the number of those in a day:
s = num_mins/(24*60);
Here's the trick with diff:
x = datenum(mydata);
s = num_mins/(24*60);
% for increasing times we shouldn't need the `abs` but to be safe
d = abs(diff(x));
q = [d (s+1)]>s&[(s+1) d]>s;
(You can use datestr
to convert back, or apply q
to the original data)
How it works:
The output of diff
is one shorter than the original - it's just the difference between neighbouring values. We need it to be directional - to check each value against the one that comes before and after.
[d (s+1)]>s
makes a vector the same length as the original, and checks if the difference values are larger than s
. Because we set the last value to be s+1, the final value will always return true
. This is a check to whether there's a gap between a value and the one following it (so for the final value this is always true).
[(s+1) d]>s
does the same but on the other side. Again, we are setting one value, this time the first, to be larger than s
so it's always true.
Combining these gives us the points where the difference is more than five minutes on either side (or for the end points, on one side).
来源:https://stackoverflow.com/questions/25888344/how-to-group-dates-within-a-certain-time-interval