Rolling 3 previous months with unique counts after groupby in pandas dataframe

早过忘川 提交于 2021-02-07 10:10:25

问题


The following is the dataframe

Date        Name     data
01/01/2017  Alpha     A      
02/01/2017  Alpha     A
03/01/2017  Alpha     B
01/01/2017  Beta      A
01/20/2017  Beta      D
03/01/2017  Beta      C
04/01/2017  Beta      C
05/01/2017  Beta      B

Expected Output:

Date        Name     data
Jan 2017     Alpha     1      
Feb 2017     Alpha     1
Mar 2017     Alpha     2
Jan 2017     Beta      2
Mar 2017     Beta      3
Apr 2017     Beta      1
May 2017     Beta      2

I am looking for unique counts of "data" group by "Name" on 3 month rolling basis. Consider the example of "March 2017" and "Name" -> "Beta". So the months considered are Jan 2017, Feb 2017, March 2017 for the Name "Beta". The unique count is 3. Similarly for others. Please note that count of "data" should be unique for those 3 months

Any help is appreciated.


回答1:


Group per month and Name, unstack and resample to month, so all months are present and you get a column per Name

df2 = df.groupby([pd.TimeGrouper('M'), 'Name', ])['data'].apply(set).unstack().resample('M').sum()

df2

Name        Alpha   Beta
Date        
2017-01-31  {A}     {A, D}
2017-02-28  {A}     None
2017-03-31  {B}     {C}
2017-04-30  None    {C}
2017-05-31  None    {B}

Multiple iterator

some itertools magic to iterate multiple times over the same column

def multiple_iterator(iterable, r=2):
    iterators = itertools.tee(iterable, r)
    try:
        for i, it in enumerate(iterators):
            for j in range(i):
                next(it)
    except StopIteration:
        return None
    return iterators

The real work

def get_unique_items_rolling(df, period):
    for col_name, col in df2.iteritems():
        s = pd.Series()
#         print(f'---{col_name}---')
        for idx, *iterators in zip(col.index[period-1:], *multiple_iterator(col, period)):
            result = set(itertools.chain.from_iterable(i for i in iterators if pd.notnull(i) and i))
#             print(idx, result)
            s[idx] = result
        yield col_name, s

df3 = pd.DataFrame.from_items(get_unique_items_rolling(df2, period))

            Alpha   Beta
2017-03-31  {A, B}  {A, D, C}
2017-04-30  {A, B}  {C}
2017-05-31  {B}     {B, C}

df3.stack().apply(len)

Date        Name 
2017-03-31  Alpha    2
            Beta     3
2017-04-30  Alpha    2
            Beta     1
2017-05-31  Alpha    1
            Beta     2
dtype: int64


来源:https://stackoverflow.com/questions/44815995/rolling-3-previous-months-with-unique-counts-after-groupby-in-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!