pandas-groupby | 易学教程

Adding a grouped, aggregate nunique column to pandas dataframe

阅读更多关于 Adding a grouped, aggregate nunique column to pandas dataframe

问题 I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc. my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column. something like this isn't working: df['n_unique_id'] = df

How to use groupby and cumcount on unique names in a Pandas column

阅读更多关于 How to use groupby and cumcount on unique names in a Pandas column

问题 I have a dataframe that looks like this ID ..... config_name config_version ... aa A 0 ab A 7 ad A 7 ad A 27 bb B 0 cc C 0 cd C 8 I want to groupby config_name and apply cumcount on each unique config_version so that I get an additional column like ID ..... config_name config_version config_version_count aa A 0 0 ab A 7 1 ad A 7 1 ad A 27 2 bb B 0 0 cc C 0 0 cd C 8 1 But I can't seem to understand how to do it. I tried using unique_count = df.groupby('config_name')['config_version'].cumcount(

How to use groupby and cumcount on unique names in a Pandas column

阅读更多关于 How to use groupby and cumcount on unique names in a Pandas column

Is there an “ungroup by” operation opposite to .groupby in pandas?

阅读更多关于 Is there an “ungroup by” operation opposite to .groupby in pandas?

问题 Suppose we take a pandas dataframe... name age family 0 john 1 1 1 jason 36 1 2 jane 32 1 3 jack 26 2 4 james 30 2 Then do a groupby() ... group_df = df.groupby('family') group_df = group_df.aggregate({'name': name_join, 'age': pd.np.mean}) Then do some aggregate/summarize operation (in my example, my function name_join aggregates the names): def name_join(list_names, concat='-'): return concat.join(list_names) The grouped summarized output is thus: age name family 1 23 john-jason-jane 2 28

Use Pandas groupby() + apply() with arguments

阅读更多关于 Use Pandas groupby() + apply() with arguments

问题 I would like to use df.groupby() in combination with apply() to apply a function to each row per group. I normally use the following code, which usually works (note, that this is without groupby() ): df.apply(myFunction, args=(arg1,)) With the groupby() I tried the following: df.groupby('columnName').apply(myFunction, args=(arg1,)) However, I get the following error: TypeError: myFunction() got an unexpected keyword argument 'args' Hence, my question is: How can I use groupby() and apply()

Groupby in python pandas: Fast Way

阅读更多关于 Groupby in python pandas: Fast Way

问题 I want to improve the time of a groupby in python pandas. I have this code: df["Nbcontrats"] = df.groupby(['Client', 'Month'])['Contrat'].transform(len) The objective is to count how many contracts a client has in a month and add this information in a new column ( Nbcontrats ). Client : client code Month : month of data extraction Contrat : contract number I want to improve the time. Below I am only working with a subset of my real data: %timeit df["Nbcontrats"] = df.groupby(['Client', 'Month

what is different between groupby.first, groupby.nth, groupby.head when as_index=False

阅读更多关于 what is different between groupby.first, groupby.nth, groupby.head when as_index=False

问题 Edit: the rookie mistake I made in string np.nan having pointed out by @coldspeed, @wen-ben, @ALollz. Answers are quite good, so I don't delete this question to keep those answers. Original: I have read this question/answer What's the difference between groupby.first() and groupby.head(1)? That answer explained that the differences are on handling NaN value. However, when I call groupby with as_index=False , they both pick NaN fine. Furthermore, Pandas has groupby.nth with similar

Pandas Groupby How to Show Zero Counts in DataFrame

阅读更多关于 Pandas Groupby How to Show Zero Counts in DataFrame

问题 I have the following Pandas dataframe: Name | EventSignupNo | Attended | Points Smith | 0145 | Y | 20.24 Smith | 0174 | Y | 29.14 Smith | 0239 | N | 0 Adams | 0145 | N | 0 Adams | 0174 | Y | 33.43 Morgan | 0239 | Y | 31.23 Morgan | 0244 | Y | 23.15 and what I'd like is a count of the number of events attended and not attended per person, and the sum of their points, per person. So I do a groupby: df.groupby([Name, Attended]).agg({"Attended": "count", "Points": "sum"}).rename(columns = {

Why doesn't first and last in a groupby give me first and last

阅读更多关于 Why doesn't first and last in a groupby give me first and last

问题 I'm posting this because the topic just got brought up in another question/answer and the behavior isn't very well documented. Consider the dataframe df df = pd.DataFrame(dict( A=list('xxxyyy'), B=[np.nan, 1, 2, 3, 4, np.nan] )) A B 0 x NaN 1 x 1.0 2 x 2.0 3 y 3.0 4 y 4.0 5 y NaN I wanted to get the first and last rows of each group defined by column 'A' . I tried df.groupby('A').B.agg(['first', 'last']) first last A x 1.0 2.0 y 3.0 4.0 However, This doesn't give me the np.NaN s that I

Why doesn't first and last in a groupby give me first and last

阅读更多关于 Why doesn't first and last in a groupby give me first and last