pandas-groupby

Modify Value of Pandas dataframe Groups

最后都变了- 提交于 2019-12-24 09:39:11
问题 We have the following dataframe (df) that has 3 columns. The goal is to make sure that the summation of "Load" for each group based on IDs is equal to 1. pd.DataFrame({'ID':['AEC','AEC','CIZ','CIZ','CIZ'],'Load':[0.2093275,0.5384086,0.1465657,0.7465657,0.1465657]}) Num ID Load 1 AEC 0.2093275 2 AEC 0.5384086 3 CIZ 0.1465657 4 CIZ 0.7465657 5 CIZ 0.1465657 If a group's total load is less or more than 1, we want to add or subtract from only one member of the group to make the summation equal 1

assigning the value to a user depending on the cluster he comes from

断了今生、忘了曾经 提交于 2019-12-24 09:32:44
问题 I have two dataframes, one with the customers who prefer songs, and my other dataframe consists of users and their cluster. DATA 1: user song A 11 A 22 B 99 B 11 C 11 D 44 C 66 E 66 D 33 E 55 F 11 F 77 DATA 2: user cluster A 1 B 2 C 3 D 1 E 2 F 3 Using above data sets, I was able to achieve what all songs are listened by users of that cluster. cluster songs 1 [11, 22, 33, 44] 2 [11, 99, 66, 55] 3 [11,66,88,77] I need to assign the song of a particular cluster to that particular user who has

Upsample in pandas multi-index

倖福魔咒の 提交于 2019-12-24 09:09:45
问题 I am trying to upsample within a grouped DataFrame but am unsure how to get it to only upsample within the groups. I have a DataFrame that looks like: cat weekstart date 0.0 2016-07-04 00:00:00+00:00 2016-07-04 1 2016-07-06 1 2016-07-07 2 2016-08-15 00:00:00+00:00 2016-08-16 1 2016-08-19 1 2016-09-19 00:00:00+00:00 2016-09-20 1 2016-09-21 1 2016-12-19 00:00:00+00:00 2016-12-19 1 2016-12-21 1 1.0 2016-07-25 00:00:00+00:00 2016-07-26 2 2016-08-01 00:00:00+00:00 2016-08-03 1 2016-08-08 00:00:00

Used groupby to select most recent data, want to append a column that returns the date of the data

倾然丶 夕夏残阳落幕 提交于 2019-12-24 07:49:31
问题 I originally had a dataframe that looked like this: industry population %of rural land country date Australia 2017-01-01 NaN NaN NaN 2016-01-01 24.327571 18.898304 12 2015-01-01 25.396251 18.835267 12 2014-01-01 27.277007 18.834835 13 United States 2017-01-01 NaN NaN NaN 2016-01-01 NaN 19.028231 NaN 2015-01-01 20.027274 19.212860 NaN 2014-01-01 20.867359 19.379071 NaN I applied the following code which pulled the most recent data for each of the columns for each of the countries and resulted

pandas.core.groupby.DataFrameGroupBy.idxmin() is very slow , how can i make my cod faster?

匆匆过客 提交于 2019-12-24 07:30:24
问题 i am trying to do same action as SQL group by and take min value : select id,min(value) ,other_fields... from table group by ('id') i tried : dfg = df.groupby('id', sort=False) idx = dfg['value'].idxmin() df = df.loc[idx, list(df.columns.values)] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin.html but line 2 the idxmin() is taking more than half hour on ~4M columns in df where the group by takes less than 1 second , what am i missing is

Sum & count of a column based on the content of the last value in each group after group-by

╄→尐↘猪︶ㄣ 提交于 2019-12-24 06:49:27
问题 I have a dataframe as below id val type aa 0 C aa 1 T aa 2 T aa 3 T aa 0 M aa 1 M aa 2 C aa 3 M bbb 0 C bbb 1 T bbb 2 T bbb 3 T bbb 0 M bbb 1 M bbb 2 C bbb 3 T cccccc 0 C cccccc 1 T cccccc 2 T cccccc 3 T cccccc 0 M cccccc 1 M cccccc 0 C cccccc 1 C I want to do a groupby "ID" and then sum & count the rows in column "val", however the rows that should be summed are only the rows that contain the "type" same as the last value of column "type" in each group. For example the last row of group has

Sum & count of a column based on the content of the last value in each group after group-by

六眼飞鱼酱① 提交于 2019-12-24 06:49:13
问题 I have a dataframe as below id val type aa 0 C aa 1 T aa 2 T aa 3 T aa 0 M aa 1 M aa 2 C aa 3 M bbb 0 C bbb 1 T bbb 2 T bbb 3 T bbb 0 M bbb 1 M bbb 2 C bbb 3 T cccccc 0 C cccccc 1 T cccccc 2 T cccccc 3 T cccccc 0 M cccccc 1 M cccccc 0 C cccccc 1 C I want to do a groupby "ID" and then sum & count the rows in column "val", however the rows that should be summed are only the rows that contain the "type" same as the last value of column "type" in each group. For example the last row of group has

create lag features based on multiple columns

大憨熊 提交于 2019-12-24 06:06:03
问题 i have a time series dataset. i need to extract the lag features. i am using below code but got all NAN's df.groupby(['week','id1','id2','id3'],as_index=False)['value'].shift(1) input week,id1,id2,id3,value 1,101,123,001,45 1,102,231,004,89 1,203,435,099,65 2,101,123,001,48 2,102,231,004,75 2,203,435,099,90 output week,id1,id2,id3,value,t-1 1,101,123,001,45,NAN 1,102,231,004,89,NAN 1,203,435,099,65,NAN 2,101,123,001,48,45 2,102,231,004,75,89 2,203,435,099,90,65 回答1: You want to shift to the

GroupBy two columns with margins for first level

北战南征 提交于 2019-12-24 06:00:40
问题 I am grouping a dataframe by 2 columns and i aggregate by the sum of the other columns. How I can have a total by the first grouped column in the same data frame? for example my data frame is: np.random.seed(0) df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : np.random.randn(8), 'D' : np.random.randn(8)}) The result of: grouped = df.groupby(by=['A', 'B']).sum() is: C D A B bar one 0

How to assign a unique ID for different groups in pandas dataframe?

允我心安 提交于 2019-12-24 05:44:06
问题 How to assign unique IDs to groups created in pandas dataframe based on certain conditions. For example: I have a dataframe named as df with the following structure:Name identifies the user, and datetime identifies the date/time at which the user is accessing a resource. Name Datetime Bob 26-04-2018 12:00:00 Claire 26-04-2018 12:00:00 Bob 26-04-2018 12:10:00 Bob 26-04-2018 12:30:00 Grace 27-04-2018 08:30:00 Bob 27-04-2018 09:30:00 Bob 27-04-2018 09:40:00 Bob 27-04-2018 10:00:00 Bob 27-04-2018