How to resample / downsample an irregular timestamp list?

半城伤御伤魂 提交于 2019-12-12 16:22:37

问题


SImple question but I haven't been able to find a simple answer.

I have a list of data which counts the time in seconds that events occur:

[200.0 420.0 560.0 1100.0 1900.0 2700.0 3400.0 3900.0 4234.2 4800.0 etc..]

I want to count how many events occur each hour (3600 seconds) and create a new list of these counts.

I understand this is called downsampling, but all the information I can find is related to traditional time series.

For the example above the new list would look like:

[7 3 etc..]

Any help would be greatly appreciated.


回答1:


all_events = [
    200.0, 420.0, 560.0, 1100.0, 1900.0, 2700.0, 3400.0, 3900.0, 4234.2, 4800.0]

def get_events_by_hour(all_events):
    return [
        len([x for x in all_events if int(x/3600.0) == hour]) 
        for hour in xrange(24)
    ]

print get_events_by_hour(all_events)

Note that all_events should contain events for one day.




回答2:


The act of sampling means taking data f_i (samples) at certain discrete times t_i. The number of samples per time unit gives the sampling rate. Downsampling is a special case of resampling, which means mapping the sampled data onto a different set of sampling points t_i', here onto one with a smaller sampling rate, making the sample more coarse.

Your first list is containing sample points t_i (unit is seconds), and indirectly the number of events n_i which corresponds to the index i, for example n_i = i + 1.

If you reduce the list once in a while, after a periodic time T (unit is seconds), you are resampling to a new set n_i' at times t_i' = i * T. I did not write downsampling, because nothing might happen within an the time T, which means upsampling, because you take more data points now.

For calculation you check if the input list is empty, in that case n' = 0 should go into your output list. Otherwise you have m entries in your input list, measured over time T and you can use the below equation:

n' = m * 3600 / T

The above n' would go into your output list, this is scaled to events per hour.




回答3:


The question has the scipy tag, and scipy depends on numpy, so I assume an answer using numpy is acceptable.

To get the hour associated with a timestamp t you can take the integer part of t/3600. Then, to get the number of events in each hour, you can count the number of occurrences of these integers. The numpy function bincount can do that for your.

Here's a numpy one-liner for the calculation. I put the timestamps in a numpy array t:

In [49]: t = numpy.array([200.0, 420.0, 560.0, 1100.0, 1900.0, 2700.0, 3400.0, 3900.0, 4234.2, 4800.0, 8300.0, 8400.0, 9500.0, 10000.0, 14321.0, 15999.0, 16789.0, 17000.0])

In [50]: t
Out[50]: 
array([   200. ,    420. ,    560. ,   1100. ,   1900. ,   2700. ,
         3400. ,   3900. ,   4234.2,   4800. ,   8300. ,   8400. ,
         9500. ,  10000. ,  14321. ,  15999. ,  16789. ,  17000. ])

Here's your calculation:

In [51]: numpy.bincount((t/3600).astype(int))
Out[51]: array([7, 3, 4, 1, 3])


来源:https://stackoverflow.com/questions/28430323/how-to-resample-downsample-an-irregular-timestamp-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!