Computing diffs within groups of a dataframe

前端 未结 6 403
你的背包
你的背包 2020-11-30 19:21

Say I have a dataframe with 3 columns: Date, Ticker, Value (no index, at least to start with). I have many dates and many tickers, but each (ticker, date) tupl

6条回答
  •  天涯浪人
    2020-11-30 19:56

    You can use pivot to convert the dataframe into date-ticker table, here is an example:

    create the test data first:

    import pandas as pd
    import numpy as np
    import random
    from itertools import product
    
    dates = pd.date_range(start="2013-12-01",  periods=10).to_native_types()
    ticks = "ABCDEF"
    pairs = list(product(dates, ticks))
    random.shuffle(pairs)
    pairs = pairs[:-5]
    values = np.random.rand(len(pairs))
    
    dates, ticks = zip(*pairs)
    df = pd.DataFrame({"date":dates, "tick":ticks, "value":values})
    

    convert the dataframe by pivot format:

    df2 = df.pivot(index="date", columns="tick", values="value")
    

    fill NaN:

    df2 = df2.fillna(method="ffill")
    

    call diff() method:

    df2.diff()
    

    here is what df2 looks like:

    tick               A         B         C         D         E         F
    date                                                                  
    2013-12-01  0.077260  0.084008  0.711626  0.071267  0.811979  0.429552
    2013-12-02  0.106349  0.141972  0.457850  0.338869  0.721703  0.217295
    2013-12-03  0.330300  0.893997  0.648687  0.628502  0.543710  0.217295
    2013-12-04  0.640902  0.827559  0.243816  0.819218  0.543710  0.190338
    2013-12-05  0.263300  0.604084  0.655723  0.299913  0.756980  0.135087
    2013-12-06  0.278123  0.243264  0.907513  0.723819  0.506553  0.717509
    2013-12-07  0.960452  0.243264  0.357450  0.160799  0.506553  0.194619
    2013-12-08  0.670322  0.256874  0.637153  0.582727  0.628581  0.159636
    2013-12-09  0.226519  0.284157  0.388755  0.325461  0.957234  0.810376
    2013-12-10  0.958412  0.852611  0.472012  0.832173  0.957234  0.723234
    

提交回复
热议问题