Adding a grouped, aggregate nunique column to pandas dataframe

白昼怎懂夜的黑 提交于 2019-12-19 08:18:11

问题


I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.

my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.

something like this isn't working:

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()

nor is

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)

this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)

in R this is easily done in data.table with

df[, n_unique_id := uniqueN(id), by = c('track', 'type')]

thanks!


回答1:


df.groupby(['track', 'type'])['id'].transform(nunique)

Implies that there is a name nunique in the name space that performs some function. transform will take a function or a string that it knows a function for. nunique is definitely one of those strings.

As pointed out by @root, often the method that pandas will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True even for passing numpy functions in some cases.

For example transform('sum') should be preferred over transform(sum).

Try this instead

df.groupby(['track', 'type'])['id'].transform('nunique')

demo

df = pd.DataFrame(dict(
    track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)

  id track type
0  X     1    A
1  X     1    A
2  Y     1    A
3  Z     1    A
4  W     2    B
5  W     2    B
6  W     2    B
7  W     2    B

df.groupby(['track', 'type'])['id'].transform('nunique')

0    3
1    3
2    3
3    3
4    1
5    1
6    1
7    1
Name: id, dtype: int64


来源:https://stackoverflow.com/questions/43726631/adding-a-grouped-aggregate-nunique-column-to-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!