pandas-groupby

Adding a grouped, aggregate nunique column to pandas dataframe

白昼怎懂夜的黑 提交于 2019-12-19 08:18:11
问题 I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc. my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column. something like this isn't working: df['n_unique_id'] = df

How to use groupby and cumcount on unique names in a Pandas column

谁说胖子不能爱 提交于 2019-12-19 04:49:07
问题 I have a dataframe that looks like this ID ..... config_name config_version ... aa A 0 ab A 7 ad A 7 ad A 27 bb B 0 cc C 0 cd C 8 I want to groupby config_name and apply cumcount on each unique config_version so that I get an additional column like ID ..... config_name config_version config_version_count aa A 0 0 ab A 7 1 ad A 7 1 ad A 27 2 bb B 0 0 cc C 0 0 cd C 8 1 But I can't seem to understand how to do it. I tried using unique_count = df.groupby('config_name')['config_version'].cumcount(

How to use groupby and cumcount on unique names in a Pandas column

落花浮王杯 提交于 2019-12-19 04:49:07
问题 I have a dataframe that looks like this ID ..... config_name config_version ... aa A 0 ab A 7 ad A 7 ad A 27 bb B 0 cc C 0 cd C 8 I want to groupby config_name and apply cumcount on each unique config_version so that I get an additional column like ID ..... config_name config_version config_version_count aa A 0 0 ab A 7 1 ad A 7 1 ad A 27 2 bb B 0 0 cc C 0 0 cd C 8 1 But I can't seem to understand how to do it. I tried using unique_count = df.groupby('config_name')['config_version'].cumcount(

Is there an “ungroup by” operation opposite to .groupby in pandas?

雨燕双飞 提交于 2019-12-18 14:48:08
问题 Suppose we take a pandas dataframe... name age family 0 john 1 1 1 jason 36 1 2 jane 32 1 3 jack 26 2 4 james 30 2 Then do a groupby() ... group_df = df.groupby('family') group_df = group_df.aggregate({'name': name_join, 'age': pd.np.mean}) Then do some aggregate/summarize operation (in my example, my function name_join aggregates the names): def name_join(list_names, concat='-'): return concat.join(list_names) The grouped summarized output is thus: age name family 1 23 john-jason-jane 2 28

Use Pandas groupby() + apply() with arguments

妖精的绣舞 提交于 2019-12-18 11:44:55
问题 I would like to use df.groupby() in combination with apply() to apply a function to each row per group. I normally use the following code, which usually works (note, that this is without groupby() ): df.apply(myFunction, args=(arg1,)) With the groupby() I tried the following: df.groupby('columnName').apply(myFunction, args=(arg1,)) However, I get the following error: TypeError: myFunction() got an unexpected keyword argument 'args' Hence, my question is: How can I use groupby() and apply()

Groupby in python pandas: Fast Way

六月ゝ 毕业季﹏ 提交于 2019-12-18 10:29:46
问题 I want to improve the time of a groupby in python pandas. I have this code: df["Nbcontrats"] = df.groupby(['Client', 'Month'])['Contrat'].transform(len) The objective is to count how many contracts a client has in a month and add this information in a new column ( Nbcontrats ). Client : client code Month : month of data extraction Contrat : contract number I want to improve the time. Below I am only working with a subset of my real data: %timeit df["Nbcontrats"] = df.groupby(['Client', 'Month

what is different between groupby.first, groupby.nth, groupby.head when as_index=False

血红的双手。 提交于 2019-12-18 09:39:22
问题 Edit: the rookie mistake I made in string np.nan having pointed out by @coldspeed, @wen-ben, @ALollz. Answers are quite good, so I don't delete this question to keep those answers. Original: I have read this question/answer What's the difference between groupby.first() and groupby.head(1)? That answer explained that the differences are on handling NaN value. However, when I call groupby with as_index=False , they both pick NaN fine. Furthermore, Pandas has groupby.nth with similar

Pandas Groupby How to Show Zero Counts in DataFrame

此生再无相见时 提交于 2019-12-18 07:25:29
问题 I have the following Pandas dataframe: Name | EventSignupNo | Attended | Points Smith | 0145 | Y | 20.24 Smith | 0174 | Y | 29.14 Smith | 0239 | N | 0 Adams | 0145 | N | 0 Adams | 0174 | Y | 33.43 Morgan | 0239 | Y | 31.23 Morgan | 0244 | Y | 23.15 and what I'd like is a count of the number of events attended and not attended per person, and the sum of their points, per person. So I do a groupby: df.groupby([Name, Attended]).agg({"Attended": "count", "Points": "sum"}).rename(columns = {

Why doesn't first and last in a groupby give me first and last

空扰寡人 提交于 2019-12-18 06:26:05
问题 I'm posting this because the topic just got brought up in another question/answer and the behavior isn't very well documented. Consider the dataframe df df = pd.DataFrame(dict( A=list('xxxyyy'), B=[np.nan, 1, 2, 3, 4, np.nan] )) A B 0 x NaN 1 x 1.0 2 x 2.0 3 y 3.0 4 y 4.0 5 y NaN I wanted to get the first and last rows of each group defined by column 'A' . I tried df.groupby('A').B.agg(['first', 'last']) first last A x 1.0 2.0 y 3.0 4.0 However, This doesn't give me the np.NaN s that I

Why doesn't first and last in a groupby give me first and last

烂漫一生 提交于 2019-12-18 06:25:19
问题 I'm posting this because the topic just got brought up in another question/answer and the behavior isn't very well documented. Consider the dataframe df df = pd.DataFrame(dict( A=list('xxxyyy'), B=[np.nan, 1, 2, 3, 4, np.nan] )) A B 0 x NaN 1 x 1.0 2 x 2.0 3 y 3.0 4 y 4.0 5 y NaN I wanted to get the first and last rows of each group defined by column 'A' . I tried df.groupby('A').B.agg(['first', 'last']) first last A x 1.0 2.0 y 3.0 4.0 However, This doesn't give me the np.NaN s that I