pandas-groupby

Grouping on identical column names in pandas

雨燕双飞 提交于 2019-12-08 08:39:31
问题 time A1 A1 A2 A2 A2 A3 A3 2017-01 a1 a2 b1 b2 c ..... 2017-02 a3 a4 b3 b4 c 2017-03 a5 a6 b5 b6 c .... There is a dataframe as shown above. How to get mean value of the columns which have the same name( as shown below)? time A1 A2 A3 2017-01 (a1+a2)/2 (b1+b2+c)/3 c 2017-02 ..... 2017-03 回答1: Use groupby with level=0 and axis=1 . df.groupby(level=0, axis=1).mean() np.random.seed(0) df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('AAABB')) df A A A B B 0 5 0 3 3 7 1 9 3 5 2 4 2 7 6

Group dataframe by multiple columns and append the result to the dataframe

北慕城南 提交于 2019-12-08 07:29:09
问题 This is similar to Attach a calculated column to an existing dataframe, however, that solution doesn't work when grouping by more than one column in pandas v0.14. For example: $ df = pd.DataFrame([ [1, 1, 1], [1, 2, 1], [1, 2, 2], [1, 3, 1], [2, 1, 1]], columns=['id', 'country', 'source']) The following calculation works: $ df.groupby(['id','country'])['source'].apply(lambda x: x.unique().tolist()) 0 [1] 1 [1, 2] 2 [1, 2] 3 [1] 4 [1] Name: source, dtype: object But assigning the output to a

Pandas: combining results from function on subset of dataframe with the original dataframe

 ̄綄美尐妖づ 提交于 2019-12-08 06:43:09
问题 I am new to Pandas so please forgive me inexperience. Nonetheless I have worked on a lot of the parts of my question here. For simplicity let's take the example from the wiki article on Quantile Normalization: A 5 4 3 B 2 1 4 C 3 4 6 D 4 2 8 and update it to fit the data structure that I am dealing with: df = pd.DataFrame({ 'gene': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e', 'e', 'f', 'f', 'f'], 'rep': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3], 'val':

Histogram per hour - matplotlib

依然范特西╮ 提交于 2019-12-08 06:03:52
问题 I'm analyzing public data on transport accidents in the UK. My dataframe looks like this : Index Time 0 02:30 1 00:37 2 01:25 3 09:15 4 07:53 5 09:29 6 08:53 7 10:05 I'm trying to plot a histogram showing accident distribution by time of day, here is my code : import matplotlib import matplotlib.pyplot as plt import numpy as np import datetime as dt import matplotlib.dates as mdates df['hour']=pd.to_datetime(df['Time'],format='%H:%M') df.set_index('hour', drop=False, inplace=True) df['hour']

Pandas: GroupBy to DataFrame

别等时光非礼了梦想. 提交于 2019-12-08 05:44:06
问题 There is a very popular S.O. question regarding groupby to dataframe see here. Unfortunately, I do not think this particular use case is the most useful. Suppose you have what could be a hierarchical dataset in a flattened form: e.g. key val 0 'a' 2 1 'a' 1 2 'b' 3 3 'b' 4 what I wish to do is convert that dataframe to this structure 'a' 'b' 0 2 3 1 1 4 I thought this would be as simple as pd.DataFrame(df.groupby('key').groups) but it is not. So how can I make this transformation? 回答1: df

How to use pandas Grouper with 7d frequency and fill missing days with 0?

别等时光非礼了梦想. 提交于 2019-12-08 03:52:35
问题 I have the following sample dataset df = pd.DataFrame({ 'names': ['joe', 'joe', 'joe'], 'dates': [dt.datetime(2019,6,1), dt.datetime(2019,6,5), dt.datetime(2019,7,1)], 'values': [5,2,13] }) and I want to group by names and by weeks or 7 days, which I can achieve with df_grouped = df.groupby(['names', pd.Grouper(key='dates', freq='7d')]).sum() values names dates joe 2019-06-01 7 2019-06-29 13 But what I would be looking for is something like this, with all the explicit dates values names dates

pandas create boolean column using groupby transform

感情迁移 提交于 2019-12-08 02:41:52
问题 I am trying to create a boolean column using GroupBy.transform on a df like this, id type 1 1.00000 1 1.00000 2 2.00000 2 3.00000 3 2.00000 the code is like, df['has_two'] = df.groupby('id')['type'].transform(lambda x: x == 2) but instead of boolean values, has_two has float values, e.g. 0.0 . I am wondering why is that. UPDATE I created a test case, df = pd.DataFrame({'id':['1', '1', '2', '2', '3'], 'type':[1.0, 1.0, 2.0, 1.0, 2.0]}) df['has_2'] = df.groupby('id')['type'].transform(lambda x:

How to use pandas Grouper on multiple keys?

孤街浪徒 提交于 2019-12-07 23:46:28
问题 I need to groupby-transform a dataframe by a datetime column AND another str(object) column to apply a function by group and asign the result to each of the row members of the group. I understand the groupby workflow but cannot make a pandas.Grouper for both conditions at the same time. Thus: How to use pandas.Grouper on multiple columns? 回答1: Use the DataFrame.groupby with a list of pandas.Grouper as the by argument like this: df['result'] = df.groupby([ pd.Grouper('dt', freq='D'), pd

Transform wide to long but with repetition of a specific column

痞子三分冷 提交于 2019-12-07 23:44:57
问题 I have a dataframe as shown below df2 = pd.DataFrame({'pid':[1,2,3,4],'BP1Date':['12/11/2016','12/21/2016','12/31/2026',np.nan],'BP1di':[21,24,25,np.nan],'BP1sy':[123,125,127,np.nan],'BP2Date':['12/31/2016','12/31/2016','12/31/2016','12/31/2016'],'BP2di':[21,26,28,30],'BP2sy':[123,130,135,145], 'BP3Date':['12/31/2017','12/31/2018','12/31/2019','12/31/2116'],'BP3di':[21,31,36,np.nan],'BP3sy':[123,126,145,np.nan]}) It looks like as shown below I expect my output to be like as shown below This

AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method

£可爱£侵袭症+ 提交于 2019-12-07 18:55:00
问题 I am very new to pandas and trying to use groupby. I have a df with multiple columns. I want to groupby a particular column and then sort each group based on a different column. I am getting the following error AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method . Any help would be much appreciated! Thanks! col1 | col2 | col3 | col4 | col5 ================================= A | A1 | A2 | A3 | DATE1 A | B1 | B2 | B3 | DATE2