Spliting a dataframe into multiple 5-second dataframes in Python

强颜欢笑 提交于 2019-12-24 00:37:37

问题


I have a relatively big dataset that I want to split into multiple dataframes in Python based on a column containing a datetime object. The values in the column (that I want to split the dataframe by) are given in the following format:

  1. 2015-11-01 00:00:05

You may assume the dataframe looks like this.

How can I split the dataframe into 5-second intervals in the following way:

  1. 1st dataframe 2015-11-01 00:00:00 - 2015-11-01 00:00:05,

  2. 2nd dataframe 2015-11-01 00:00:05 - 2015-11-01 00:00:10, and so on.

I also need to count the number of observations in each of resulting dataframes. In other, words, it would be nice if I could get another dataframe with 2 columns: 1st representing the splitted group (values of this column don't matter: they could be simply 1, 2, 3,.. indicating the order of the 5-second intervals ), 2nd column showing the number of observations belonging to the respective intervals


回答1:


I think the best for store multiple DataFrames is dict:

rng = pd.date_range('2015-11-01 00:00:00', periods=100, freq='S')
df = pd.DataFrame({'Date': rng, 'a': range(100)})  
print (df.head(10))
                 Date  a
0 2015-11-01 00:00:00  0
1 2015-11-01 00:00:01  1
2 2015-11-01 00:00:02  2
3 2015-11-01 00:00:03  3
4 2015-11-01 00:00:04  4
5 2015-11-01 00:00:05  5
6 2015-11-01 00:00:06  6
7 2015-11-01 00:00:07  7
8 2015-11-01 00:00:08  8
9 2015-11-01 00:00:09  9

dfs={k.strftime('%Y-%m-%d %H:%M:%S'):v for k,v in 
                 df.groupby(pd.Grouper(key='Date', freq='5S'))}

print (dfs['2015-11-01 00:00:00'])
                 Date  a
0 2015-11-01 00:00:00  0
1 2015-11-01 00:00:01  1
2 2015-11-01 00:00:02  2
3 2015-11-01 00:00:03  3
4 2015-11-01 00:00:04  4

print (dfs['2015-11-01 00:00:05'])
                 Date  a
5 2015-11-01 00:00:05  5
6 2015-11-01 00:00:06  6
7 2015-11-01 00:00:07  7
8 2015-11-01 00:00:08  8
9 2015-11-01 00:00:09  9



回答2:


You can group by a floor of Date column by 5s

f = '{:%Y-%m-%d %H:%M:%S}'.format

dfs = {f(k): g for k, g in df.groupby(df.Date.dt.floor('5s'))}


来源:https://stackoverflow.com/questions/47131431/spliting-a-dataframe-into-multiple-5-second-dataframes-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!