Pandas dataframe : Multiple Time/Date columns to single Date index

♀尐吖头ヾ 提交于 2021-02-18 18:56:35

问题


I have a dataframe with a Product as a first column, and then 12 month of sales (one column per month). I'd like to 'pivot' the dataframe to end up with a single date index.

example data :

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(10, 1000, size=(2,12)), index=['PrinterBlue', 'PrinterBetter'], columns=pd.date_range('1-1', periods=12, freq='M'))

yielding:

>>> df
           2014-01-31  2014-02-28  2014-03-31  2014-04-30  2014-05-31  \
PrinterBlue           176          77          89         279          81   
PrinterBetter         801         660         349         608         322   

           2014-06-30  2014-07-31  2014-08-31  2014-09-30  2014-10-31  \
PrinterBlue           286         831         114         996         904   
PrinterBetter         994         374         895         586         646   

           2014-11-30  2014-12-31  
PrinterBlue           458         117  
PrinterBetter         366         196  

Desired result :

   Brand           Date          Sales
PrinterBlue    2014-01-31          176
               2014-02-28           77
               2014-03-31           89
                  [...]
               2014-11-30          458
               2014-12-31          117
PrinterBetter  2014-01-31          801
               2014-02-28          660
               2014-03-31          349
                  [...]
               2014-11-30          366
               2014-12-31          196

I can imagine getting the result by :

  1. Building 12 sub dataframe, each containing only one month of information
  2. Pivoting each dataframe
  3. Concatenating them

But that seems like an pretty complicated way to make the target transformation. Is there a better / simpler way ?


回答1:


I think pandas melt provides the functionality you are looking for

http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-melt

import pandas as pd
import numpy as np
from pandas import melt

df = pd.DataFrame(np.random.randint(10, 1000, size=(2,12)), index=['PrinterBlue', 'PrinterBetter'], columns=pd.date_range('1-1', periods=12, freq='M'))

dft = df.T
dft["date"] = dft.index
result = melt(dft, id_vars=["date"])
result.columns = ["date", "brand", "sales"]
print (result)

outputs this:

         date          brand  sales
0  2014-01-31    PrinterBlue    242
1  2014-02-28    PrinterBlue    670
2  2014-03-31    PrinterBlue    142
3  2014-04-30    PrinterBlue    571
4  2014-05-31    PrinterBlue    826
5  2014-06-30    PrinterBlue    515
6  2014-07-31    PrinterBlue    568
7  2014-08-31    PrinterBlue     90
8  2014-09-30    PrinterBlue    652
9  2014-10-31    PrinterBlue    488
10 2014-11-30    PrinterBlue    671
11 2014-12-31    PrinterBlue    767
12 2014-01-31  PrinterBetter    294
13 2014-02-28  PrinterBetter     77
14 2014-03-31  PrinterBetter     59
15 2014-04-30  PrinterBetter    373
16 2014-05-31  PrinterBetter    228
17 2014-06-30  PrinterBetter    708
18 2014-07-31  PrinterBetter     16
19 2014-08-31  PrinterBetter    542
20 2014-09-30  PrinterBetter    577
21 2014-10-31  PrinterBetter    141
22 2014-11-30  PrinterBetter    358
23 2014-12-31  PrinterBetter    290


来源:https://stackoverflow.com/questions/21928814/pandas-dataframe-multiple-time-date-columns-to-single-date-index

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!