DataFrame of DataFrames with pandas

折月煮酒 提交于 2019-12-23 08:24:19

问题


I have the following DataFrame gathering daily stats on 2 measures A and B :

                  A             B
count  17266.000000  17266.000000
std        0.179003      0.178781
75%      101.102251    101.053214
min      100.700993    100.651956
mean     101.016747    100.964003
max      101.540214    101.491178
50%      100.988465    100.938694
25%      100.885251    100.830048

Below is a piece of code that creates it:

day1 = {
    'A': {
    'count': 17266.0,
    'std': 0.17900265293286116,
    'min': 100.70099294189714,
    'max': 101.54021448871775,
    '50%': 100.98846526697825,
    '25%': 100.88525124427971,
    '75%': 101.10225131847992, 
    'mean': 101.01674677794136
    }, 
    'B': {
    'count': 17266.0, 
    'std': 0.17878125983374854, 
    'min': 100.65195609992342, 
    'max': 101.49117764674403, 
    '50%': 100.93869409089723, 
    '25%': 100.83004837814667, 
    '75%': 101.05321447650618, 
    'mean': 100.96400305527138
    }
}
df = pandas.DataFrame.from_dict(day1, orient='index').T

The data come right out from a describe(). I have several such describes (one for each day) and I would like to gather them all into a single dataframe that has the date as an index.

The most obvious way to obtain that would be to stack all the daily results into one dataframe, then group it by day and run the stats on the result. However I would like an alternate method because I run into a MemoryError with the amount of data I process.

The final outcome should look like this:

                        A           B    
2014-12-24 count  15895.000000  15895.000000
        mean      99.943618     99.968860
        std        0.012468      0.011932
        min       99.877695     99.928778
        25%       99.934890     99.960445
        50%       99.943453     99.968847
        75%       99.952340     99.977571
        max       99.982930    100.002507
2014-12-25 count  16278.000000  16278.000000
        mean      99.937056     99.962203
        std        0.012395      0.012661
        min       99.884501     99.910567
        25%       99.928078     99.953758
        50%       99.936754     99.962411
        75%       99.945914     99.971473
        max       99.981512    100.003770

回答1:


If you are able to make a dict of {date: describe_df_for_that_day}, then you can use pd.concat(dict).

Starting with your df:

In [14]: d = {'2014-12-24': df, '2014-12-25': df}

In [15]: pd.concat(d)
Out[15]:
                             A             B
2014-12-24 count  17266.000000  17266.000000
           std        0.179003      0.178781
           75%      101.102251    101.053214
           min      100.700993    100.651956
           mean     101.016747    100.964003
           max      101.540214    101.491178
           50%      100.988465    100.938694
           25%      100.885251    100.830048
2014-12-25 count  17266.000000  17266.000000
           std        0.179003      0.178781
           75%      101.102251    101.053214
           min      100.700993    100.651956
           mean     101.016747    100.964003
           max      101.540214    101.491178
           50%      100.988465    100.938694
           25%      100.885251    100.830048

You can of course make the keys real dates instead of strings.



来源:https://stackoverflow.com/questions/28368598/dataframe-of-dataframes-with-pandas

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!