问题

The following is the dataframe

Date        Name     data
01/01/2017  Alpha     A      
02/01/2017  Alpha     A
03/01/2017  Alpha     B
01/01/2017  Beta      A
01/20/2017  Beta      D
03/01/2017  Beta      C
04/01/2017  Beta      C
05/01/2017  Beta      B

Expected Output:

Date        Name     data
Jan 2017     Alpha     1      
Feb 2017     Alpha     1
Mar 2017     Alpha     2
Jan 2017     Beta      2
Mar 2017     Beta      3
Apr 2017     Beta      1
May 2017     Beta      2

I am looking for unique counts of "data" group by "Name" on 3 month rolling basis. Consider the example of "March 2017" and "Name" -> "Beta". So the months considered are Jan 2017, Feb 2017, March 2017 for the Name "Beta". The unique count is 3. Similarly for others. Please note that count of "data" should be unique for those 3 months

Any help is appreciated.

回答1:

Group per month and Name, unstack and resample to month, so all months are present and you get a column per Name

df2 = df.groupby([pd.TimeGrouper('M'), 'Name', ])['data'].apply(set).unstack().resample('M').sum()

df2

Name        Alpha   Beta
Date        
2017-01-31  {A}     {A, D}
2017-02-28  {A}     None
2017-03-31  {B}     {C}
2017-04-30  None    {C}
2017-05-31  None    {B}

Multiple iterator

some itertools magic to iterate multiple times over the same column

def multiple_iterator(iterable, r=2):
    iterators = itertools.tee(iterable, r)
    try:
        for i, it in enumerate(iterators):
            for j in range(i):
                next(it)
    except StopIteration:
        return None
    return iterators

The real work

def get_unique_items_rolling(df, period):
    for col_name, col in df2.iteritems():
        s = pd.Series()
#         print(f'---{col_name}---')
        for idx, *iterators in zip(col.index[period-1:], *multiple_iterator(col, period)):
            result = set(itertools.chain.from_iterable(i for i in iterators if pd.notnull(i) and i))
#             print(idx, result)
            s[idx] = result
        yield col_name, s

df3 = pd.DataFrame.from_items(get_unique_items_rolling(df2, period))

            Alpha   Beta
2017-03-31  {A, B}  {A, D, C}
2017-04-30  {A, B}  {C}
2017-05-31  {B}     {B, C}

df3.stack().apply(len)

Date        Name 
2017-03-31  Alpha    2
            Beta     3
2017-04-30  Alpha    2
            Beta     1
2017-05-31  Alpha    1
            Beta     2
dtype: int64

来源：https://stackoverflow.com/questions/44815995/rolling-3-previous-months-with-unique-counts-after-groupby-in-pandas-dataframe

标签

python

python-3.x

pandas

Rolling 3 previous months with unique counts after groupby in pandas dataframe

问题

回答1:

Multiple iterator

The real work