How to properly pivot or reshape a timeseries dataframe in Pandas?

折月煮酒 提交于 2019-12-04 19:02:50
import numpy as np
import pandas as pd
import io

data = '''\
                      val
2007-08-07 18:00:00    1
2007-08-08 00:00:00    2
2007-08-08 06:00:00    3
2007-08-08 12:00:00    4
2007-08-08 18:00:00    5
2007-11-02 18:00:00    6
2007-11-03 00:00:00    7
2007-11-03 06:00:00    8
2007-11-03 12:00:00    9
2007-11-03 18:00:00   10'''

df = pd.read_table(io.BytesIO(data), sep='\s{2,}', parse_dates=True)

chunksize = 5
chunks = len(df)//chunksize

df['Date'] = np.repeat(df.index.date[::chunksize], chunksize)[:len(df)]
index = df.index.time[:chunksize]
df['Time'] = np.tile(np.arange(chunksize), chunks)
df = df.set_index(['Date', 'Time'], append=False)

df = df['val'].unstack('Date')
df.index = index
print(df)

yields

Date      2007-08-07  2007-11-02
18:00:00           1           6
00:00:00           2           7
06:00:00           3           8
12:00:00           4           9
18:00:00           5          10

Note that the final DataFrame has an index with non-unique entries. (The 18:00:00 is repeated.) Some DataFrame operations are problematic when the index has repeated entries, so in general it is better to avoid this if possible.

First of all I'm assuming your datetime column is actually a datetime type if not use df['t'] = pd.to_datetime(df['t']) to convert.

Then set your index using a multindex and unstack...

df.index = pd.MultiIndex.from_tuples(df['t'].apply(lambda x: [x.time(),x.date()]))
df['v'].unstack()

This would be a canonical approach for pandas:

First, setup with imports and data:

import pandas as pd
import StringIO


txt = '''2007-08-07 18:00:00 1
2007-08-08 00:00:00 2
2007-08-08 06:00:00 3
2007-08-08 12:00:00 4
2007-08-08 18:00:00 5
2007-11-02 18:00:00 6
2007-11-03 00:00:00 7
2007-11-03 06:00:00 8
2007-11-03 12:00:00 9
2007-11-03 18:00:00 10'''

Now read in the DataFrame, and pivot on the correct columns:

df1 = pd.read_csv(StringIO.StringIO(txt), sep=' ', 
                  names=['d', 't', 'n'], )
print(df1.pivot(index='t', columns='d', values='n'))

prints a pivoted df:

d         2007-08-07  2007-08-08  2007-11-02  2007-11-03
t                                                       
00:00:00         NaN           2         NaN           7
06:00:00         NaN           3         NaN           8
12:00:00         NaN           4         NaN           9
18:00:00           1           5           6          10

You won't get a length of 5, though. The following,

          2007-08-07  2007-11-02
18:00:00      1           6
00:00:00      2           7
06:00:00      3           8
12:00:00      4           9
18:00:00      5          10

is incorrect, as you have 18:00:00 twice for the same date, and in your initial data, they apply to different dates.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!