问题
The following is the dataframe
Date Name data
01/01/2017 Alpha A
02/01/2017 Alpha A
03/01/2017 Alpha B
01/01/2017 Beta A
01/20/2017 Beta D
03/01/2017 Beta C
04/01/2017 Beta C
05/01/2017 Beta B
Expected Output:
Date Name data
Jan 2017 Alpha 1
Feb 2017 Alpha 1
Mar 2017 Alpha 2
Jan 2017 Beta 2
Mar 2017 Beta 3
Apr 2017 Beta 1
May 2017 Beta 2
I am looking for unique counts of "data" group by "Name" on 3 month rolling basis. Consider the example of "March 2017" and "Name" -> "Beta". So the months considered are Jan 2017, Feb 2017, March 2017 for the Name "Beta". The unique count is 3. Similarly for others. Please note that count of "data" should be unique for those 3 months
Any help is appreciated.
回答1:
Group per month and Name, unstack and resample to month, so all months are present and you get a column per Name
df2 = df.groupby([pd.TimeGrouper('M'), 'Name', ])['data'].apply(set).unstack().resample('M').sum()
df2
Name Alpha Beta
Date
2017-01-31 {A} {A, D}
2017-02-28 {A} None
2017-03-31 {B} {C}
2017-04-30 None {C}
2017-05-31 None {B}
Multiple iterator
some itertools magic to iterate multiple times over the same column
def multiple_iterator(iterable, r=2):
iterators = itertools.tee(iterable, r)
try:
for i, it in enumerate(iterators):
for j in range(i):
next(it)
except StopIteration:
return None
return iterators
The real work
def get_unique_items_rolling(df, period):
for col_name, col in df2.iteritems():
s = pd.Series()
# print(f'---{col_name}---')
for idx, *iterators in zip(col.index[period-1:], *multiple_iterator(col, period)):
result = set(itertools.chain.from_iterable(i for i in iterators if pd.notnull(i) and i))
# print(idx, result)
s[idx] = result
yield col_name, s
df3 = pd.DataFrame.from_items(get_unique_items_rolling(df2, period))
Alpha Beta
2017-03-31 {A, B} {A, D, C}
2017-04-30 {A, B} {C}
2017-05-31 {B} {B, C}
df3.stack().apply(len)
Date Name
2017-03-31 Alpha 2
Beta 3
2017-04-30 Alpha 2
Beta 1
2017-05-31 Alpha 1
Beta 2
dtype: int64
来源:https://stackoverflow.com/questions/44815995/rolling-3-previous-months-with-unique-counts-after-groupby-in-pandas-dataframe