pandas-groupby

Aggregating string columns using pandas GroupBy

十年热恋 提交于 2020-01-23 11:06:08
问题 I have a DF such as the following: df = vid pos value sente 1 a A 21 2 b B 21 3 b A 21 3 a A 21 1 d B 22 1 a C 22 1 a D 22 2 b A 22 3 a A 22 Now I want to combine all rows with the same value for sente and vid into one row with the values for value joined by an " " df2 = vid pos value sente 1 a A 21 2 b B 21 3 b a A A 21 1 d a a B C D 22 2 b A 22 3 a A 22 I suppose a modification of this should do the trick: df2 = df.groupby["sente"].agg(lambda x: " ".join(x)) But I can't seem to figure out

pandas GroupBy and cumulative mean of previous rows in group

℡╲_俬逩灬. 提交于 2020-01-21 11:51:46
问题 I have a dataframe which looks like this: pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4], 'order_start': [1,2,3,1,2,3,1,2,3,1], 'time': [1, 4, 3, 6, 8, 17, 14, 12, 13, 16]}) Out[40]: category order_start time 0 1 1 1 1 1 2 4 2 1 3 3 3 2 1 6 4 2 2 8 5 2 3 17 6 3 1 14 7 3 2 12 8 3 3 13 9 4 1 16 I would like to create a new column which contains the mean of the previous times of the same category. How can I create it ? The new column should look like this: pd.DataFrame({'category': [1,1,1,2,2,2

pandas GroupBy and cumulative mean of previous rows in group

ぐ巨炮叔叔 提交于 2020-01-21 11:51:24
问题 I have a dataframe which looks like this: pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4], 'order_start': [1,2,3,1,2,3,1,2,3,1], 'time': [1, 4, 3, 6, 8, 17, 14, 12, 13, 16]}) Out[40]: category order_start time 0 1 1 1 1 1 2 4 2 1 3 3 3 2 1 6 4 2 2 8 5 2 3 17 6 3 1 14 7 3 2 12 8 3 3 13 9 4 1 16 I would like to create a new column which contains the mean of the previous times of the same category. How can I create it ? The new column should look like this: pd.DataFrame({'category': [1,1,1,2,2,2

Pandas: Selecting rows for which groupby.sum() satisfies condition

大城市里の小女人 提交于 2020-01-21 11:30:13
问题 In pandas I have a dataframe of the form: >>> import pandas as pd >>> df = pd.DataFrame({'ID':[51,51,51,24,24,24,31], 'x':[0,1,0,0,1,1,0]}) >>> df ID x 51 0 51 1 51 0 24 0 24 1 24 1 31 0 For every 'ID' the value of 'x' is recorded several times, it is either 0 or 1. I want to select those rows from df that contain an 'ID' for which 'x' is 1 at least twice. For every 'ID' I manage to count the number of times 'x' is 1, by >>> df.groupby('ID')['x'].sum() ID 51 1 24 2 31 0 But I don't know how

pandas group by and assign a group id then ungroup

独自空忆成欢 提交于 2020-01-19 06:08:59
问题 I have a large data set in the following format: id, socialmedia 1, facebook 2, facebook 3, google 4, google 5, google 6, twitter 7, google 8, twitter 9, snapchat 10, twitter 11, facebook I want to group by then and assign a group_id column and then ungroup (expand) back to individual records. id, socialmedia, groupId 1, facebook, 1 2, facebook, 1 3, google, 2 4, google, 2 5, google, 2 6, twitter, 3 7, google, 2 8, twitter, 3 9, snapchat, 4 10, twitter, 3 11, facebook, 1 I tried following but

pandas group by and assign a group id then ungroup

不想你离开。 提交于 2020-01-19 06:08:04
问题 I have a large data set in the following format: id, socialmedia 1, facebook 2, facebook 3, google 4, google 5, google 6, twitter 7, google 8, twitter 9, snapchat 10, twitter 11, facebook I want to group by then and assign a group_id column and then ungroup (expand) back to individual records. id, socialmedia, groupId 1, facebook, 1 2, facebook, 1 3, google, 2 4, google, 2 5, google, 2 6, twitter, 3 7, google, 2 8, twitter, 3 9, snapchat, 4 10, twitter, 3 11, facebook, 1 I tried following but

Pandas groupby with delimiter join

妖精的绣舞 提交于 2020-01-16 09:13:06
问题 I tried to use groupby to group rows with multiple values. col val A Cat A Tiger B Ball B Bat import pandas as pd df = pd.read_csv("Inputfile.txt", sep='\t') group = df.groupby(['col'])['val'].sum() I got A CatTiger B BallBat I want to introduce a delimiter, so that my output looks like A Cat-Tiger B Ball-Bat I tried, group = df.groupby(['col'])['val'].sum().apply(lambda x: '-'.join(x)) this yielded, A C-a-t-T-i-g-e-r B B-a-l-l-B-a-t What is the issue here ? Thanks, AP 回答1: Alternatively you

Strange behavior when trying to append a row to each group in a group by object

安稳与你 提交于 2020-01-14 14:27:09
问题 This question is about a function behaving in an unexpected manner when applied on two different dataframes - more precisely, groupby objects. Either I'm missing something that is obviously wrong or there's a bug in pandas. I wrote the below function to append a row to each group in a groupby object.This question is another question that is related to the function. def myfunction(g, now): '''This function appends a row to each group and populates the DTM column value of that row with the

Pandas split CSV into multiple CSV's (or DataFrames) by a column

时间秒杀一切 提交于 2020-01-14 12:36:59
问题 I'm very lost with a problem and some help or tips will be appreciated. The problem: I've a csv file with a column with the possibility of multiple values like: Fruit;Color;The_evil_column Apple;Red;something1 Apple;Green;something1 Orange;Orange;something1 Orange;Green;something2 Apple;Red;something2 Apple;Red;something3 I've loaded the data into a dataframe and i need to split that dataframe into multiple dataframes based on the value of the column "The_evil_column": df1 Fruit;Color;The

Pandas split CSV into multiple CSV's (or DataFrames) by a column

家住魔仙堡 提交于 2020-01-14 12:36:54
问题 I'm very lost with a problem and some help or tips will be appreciated. The problem: I've a csv file with a column with the possibility of multiple values like: Fruit;Color;The_evil_column Apple;Red;something1 Apple;Green;something1 Orange;Orange;something1 Orange;Green;something2 Apple;Red;something2 Apple;Red;something3 I've loaded the data into a dataframe and i need to split that dataframe into multiple dataframes based on the value of the column "The_evil_column": df1 Fruit;Color;The