pandas-groupby

function returning pandas dataframe

点点圈 提交于 2021-02-10 11:48:46
问题 I was not clear about my issue, so I am reviewing the question. I have a function manipulating a generic dataframe (it removes and renames columns and records): def manipulate_df(df_local): df_local.rename(columns={'A': 'grouping_column'}, inplace = True) df_local.drop('B', axis=1, inplace=True) df_local.drop(df.query('grouping_column not in (\'1\', \'0\')').index, inplace = True) df_local = df_local.groupby(['grouping_column'])['C'].sum().to_frame().reset_index().copy() print("this is what I

How to groupby two columns and calculate the summation of rows using Pandas?

谁说我不能喝 提交于 2021-02-10 07:36:17
问题 I have a pandas data frame df like: Name Hour Activity A 4 TT A 3 TT A 5 UU B 1 TT C 1 TT D 1 TT D 2 TT D 3 UU D 4 UU The next step is to get the summation if the rows have identical value of the column Name and Activity . For example, for the case Name: A and Activity: TT will give the summation of 7 The result is the presented as below TT UU A 7 5 B 1 0 C 1 0 D 3 7 Is it possible to do something like this using pandas groupby? 回答1: Try groupby.sum and unstack df_final = df.groupby(['Name',

Groupby and resample timeseries so date ranges are consistent

萝らか妹 提交于 2021-02-09 10:55:23
问题 I have a dataframe which is basically several timeseries stacked on top of one another. Each time series has a unique label (group) and they have different date ranges. date = pd.to_datetime(pd.Series(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-06', '2010-01-01', '2010-01-03'])) group = [1,1,1,1, 2, 2] value = [1,2,3,4,5,6] df = pd.DataFrame({'date':date, 'group':group, 'value':value}) df date group value 0 2010-01-01 1 1 1 2010-01-02 1 2 2 2010-01-03 1 3 3 2010-01-06 1 4 4 2010-01-01

Groupby and resample timeseries so date ranges are consistent

岁酱吖の 提交于 2021-02-09 10:55:20
问题 I have a dataframe which is basically several timeseries stacked on top of one another. Each time series has a unique label (group) and they have different date ranges. date = pd.to_datetime(pd.Series(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-06', '2010-01-01', '2010-01-03'])) group = [1,1,1,1, 2, 2] value = [1,2,3,4,5,6] df = pd.DataFrame({'date':date, 'group':group, 'value':value}) df date group value 0 2010-01-01 1 1 1 2010-01-02 1 2 2 2010-01-03 1 3 3 2010-01-06 1 4 4 2010-01-01

pandas sum the differences between two columns in each group

自作多情 提交于 2021-02-08 10:48:33
问题 I have a df looks like, A B C D 2017-10-01 2017-10-11 M 2017-10 2017-10-02 2017-10-03 M 2017-10 2017-11-01 2017-11-04 B 2017-11 2017-11-08 2017-11-09 B 2017-11 2018-01-01 2018-01-03 A 2018-01 the dtype of A and B are datetime64 , C and D are of strings ; I like to groupby C and D and get the differences between B and A , df.groupby(['C', 'D']).apply(lambda row: row['B'] - row['A']) but I don't know how to sum such differences in each group and assign the values to a new column say E ,

pandas sum the differences between two columns in each group

試著忘記壹切 提交于 2021-02-08 10:45:26
问题 I have a df looks like, A B C D 2017-10-01 2017-10-11 M 2017-10 2017-10-02 2017-10-03 M 2017-10 2017-11-01 2017-11-04 B 2017-11 2017-11-08 2017-11-09 B 2017-11 2018-01-01 2018-01-03 A 2018-01 the dtype of A and B are datetime64 , C and D are of strings ; I like to groupby C and D and get the differences between B and A , df.groupby(['C', 'D']).apply(lambda row: row['B'] - row['A']) but I don't know how to sum such differences in each group and assign the values to a new column say E ,

Summing columns in Dataframe that have matching column headers

試著忘記壹切 提交于 2021-02-08 06:22:08
问题 I have a dataframe that currently looks somewhat like this. import pandas as pd In [161]: pd.DataFrame(np.c_[s,t],columns = ["M1","M2","M1","M2"]) Out[161]: M1 M2 M1 M2 6/7 1 2 3 5 6/8 2 4 7 8 6/9 3 6 9 9 6/10 4 8 8 10 6/11 5 10 20 40 Except, instead of just four columns, there are approximately 1000 columns, from M1 till ~M340 (there are multiple columns with the same headers). I wanted to sum the values associated with matching columns based on their index. Ideally, the result dataframe

python group by and count() multiple columns

六眼飞鱼酱① 提交于 2021-02-08 05:03:37
问题 I have a data frame like this: Country A B C UK 1 0 1 US 1 1 1 GB 0 1 1 UK 1 1 1 US 0 1 1 GB 0 1 1 I need to groupby country and count in all columns where value is 1. I'm stuck on setting the condition of columns == 1 for all them. The result should be something like: Country A B C UK 2 0 2 US 1 2 2 GB 0 2 2 回答1: Because you are counting 1's you can just groupby([]).sum() df['country'] = df.index # to generate a new column result = df.groupby(['country']).sum() This gives you the result: a b

python group by and count() multiple columns

99封情书 提交于 2021-02-08 05:01:44
问题 I have a data frame like this: Country A B C UK 1 0 1 US 1 1 1 GB 0 1 1 UK 1 1 1 US 0 1 1 GB 0 1 1 I need to groupby country and count in all columns where value is 1. I'm stuck on setting the condition of columns == 1 for all them. The result should be something like: Country A B C UK 2 0 2 US 1 2 2 GB 0 2 2 回答1: Because you are counting 1's you can just groupby([]).sum() df['country'] = df.index # to generate a new column result = df.groupby(['country']).sum() This gives you the result: a b

Create new column based on condition on other categorical column

我只是一个虾纸丫 提交于 2021-02-07 23:57:12
问题 I have a dataframe as shown below Category Value A 10 B 22 A 2 C 30 B 23 B 4 C 8 C 24 A 9 I need to create a Flag column Flag based following conditions If the values of Category A is greater than or equal 5 then Flag=1, else 0 If the values of Category B is greater than or equal 20 then Flag=1, else 0 If the values of Category C is greater than or equal 25 then Flag=1, else 0 Expected output as shown below Category Value Flag A 10 1 B 22 1 A 2 0 C 30 1 B 23 1 B 4 0 C 8 0 C 24 0 A 9 1 I tried