pandas-groupby | 易学教程

Modify Value of Pandas dataframe Groups

阅读更多关于 Modify Value of Pandas dataframe Groups

问题 We have the following dataframe (df) that has 3 columns. The goal is to make sure that the summation of "Load" for each group based on IDs is equal to 1. pd.DataFrame({'ID':['AEC','AEC','CIZ','CIZ','CIZ'],'Load':[0.2093275,0.5384086,0.1465657,0.7465657,0.1465657]}) Num ID Load 1 AEC 0.2093275 2 AEC 0.5384086 3 CIZ 0.1465657 4 CIZ 0.7465657 5 CIZ 0.1465657 If a group's total load is less or more than 1, we want to add or subtract from only one member of the group to make the summation equal 1

assigning the value to a user depending on the cluster he comes from

阅读更多关于 assigning the value to a user depending on the cluster he comes from

问题 I have two dataframes, one with the customers who prefer songs, and my other dataframe consists of users and their cluster. DATA 1: user song A 11 A 22 B 99 B 11 C 11 D 44 C 66 E 66 D 33 E 55 F 11 F 77 DATA 2: user cluster A 1 B 2 C 3 D 1 E 2 F 3 Using above data sets, I was able to achieve what all songs are listened by users of that cluster. cluster songs 1 [11, 22, 33, 44] 2 [11, 99, 66, 55] 3 [11,66,88,77] I need to assign the song of a particular cluster to that particular user who has

Upsample in pandas multi-index

阅读更多关于 Upsample in pandas multi-index

问题 I am trying to upsample within a grouped DataFrame but am unsure how to get it to only upsample within the groups. I have a DataFrame that looks like: cat weekstart date 0.0 2016-07-04 00:00:00+00:00 2016-07-04 1 2016-07-06 1 2016-07-07 2 2016-08-15 00:00:00+00:00 2016-08-16 1 2016-08-19 1 2016-09-19 00:00:00+00:00 2016-09-20 1 2016-09-21 1 2016-12-19 00:00:00+00:00 2016-12-19 1 2016-12-21 1 1.0 2016-07-25 00:00:00+00:00 2016-07-26 2 2016-08-01 00:00:00+00:00 2016-08-03 1 2016-08-08 00:00:00

Used groupby to select most recent data, want to append a column that returns the date of the data

阅读更多关于 Used groupby to select most recent data, want to append a column that returns the date of the data

问题 I originally had a dataframe that looked like this: industry population %of rural land country date Australia 2017-01-01 NaN NaN NaN 2016-01-01 24.327571 18.898304 12 2015-01-01 25.396251 18.835267 12 2014-01-01 27.277007 18.834835 13 United States 2017-01-01 NaN NaN NaN 2016-01-01 NaN 19.028231 NaN 2015-01-01 20.027274 19.212860 NaN 2014-01-01 20.867359 19.379071 NaN I applied the following code which pulled the most recent data for each of the columns for each of the countries and resulted

pandas.core.groupby.DataFrameGroupBy.idxmin() is very slow , how can i make my cod faster?

阅读更多关于 pandas.core.groupby.DataFrameGroupBy.idxmin() is very slow , how can i make my cod faster?

问题 i am trying to do same action as SQL group by and take min value : select id,min(value) ,other_fields... from table group by ('id') i tried : dfg = df.groupby('id', sort=False) idx = dfg['value'].idxmin() df = df.loc[idx, list(df.columns.values)] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin.html but line 2 the idxmin() is taking more than half hour on ~4M columns in df where the group by takes less than 1 second , what am i missing is

Sum & count of a column based on the content of the last value in each group after group-by

阅读更多关于 Sum & count of a column based on the content of the last value in each group after group-by

问题 I have a dataframe as below id val type aa 0 C aa 1 T aa 2 T aa 3 T aa 0 M aa 1 M aa 2 C aa 3 M bbb 0 C bbb 1 T bbb 2 T bbb 3 T bbb 0 M bbb 1 M bbb 2 C bbb 3 T cccccc 0 C cccccc 1 T cccccc 2 T cccccc 3 T cccccc 0 M cccccc 1 M cccccc 0 C cccccc 1 C I want to do a groupby "ID" and then sum & count the rows in column "val", however the rows that should be summed are only the rows that contain the "type" same as the last value of column "type" in each group. For example the last row of group has

Sum & count of a column based on the content of the last value in each group after group-by

阅读更多关于 Sum & count of a column based on the content of the last value in each group after group-by

create lag features based on multiple columns

阅读更多关于 create lag features based on multiple columns

问题 i have a time series dataset. i need to extract the lag features. i am using below code but got all NAN's df.groupby(['week','id1','id2','id3'],as_index=False)['value'].shift(1) input week,id1,id2,id3,value 1,101,123,001,45 1,102,231,004,89 1,203,435,099,65 2,101,123,001,48 2,102,231,004,75 2,203,435,099,90 output week,id1,id2,id3,value,t-1 1,101,123,001,45,NAN 1,102,231,004,89,NAN 1,203,435,099,65,NAN 2,101,123,001,48,45 2,102,231,004,75,89 2,203,435,099,90,65 回答1: You want to shift to the

GroupBy two columns with margins for first level

阅读更多关于 GroupBy two columns with margins for first level

问题 I am grouping a dataframe by 2 columns and i aggregate by the sum of the other columns. How I can have a total by the first grouped column in the same data frame? for example my data frame is: np.random.seed(0) df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : np.random.randn(8), 'D' : np.random.randn(8)}) The result of: grouped = df.groupby(by=['A', 'B']).sum() is: C D A B bar one 0

How to assign a unique ID for different groups in pandas dataframe?

阅读更多关于 How to assign a unique ID for different groups in pandas dataframe?

问题 How to assign unique IDs to groups created in pandas dataframe based on certain conditions. For example: I have a dataframe named as df with the following structure:Name identifies the user, and datetime identifies the date/time at which the user is accessing a resource. Name Datetime Bob 26-04-2018 12:00:00 Claire 26-04-2018 12:00:00 Bob 26-04-2018 12:10:00 Bob 26-04-2018 12:30:00 Grace 27-04-2018 08:30:00 Bob 27-04-2018 09:30:00 Bob 27-04-2018 09:40:00 Bob 27-04-2018 10:00:00 Bob 27-04-2018