pandas-groupby | 易学教程

Grouping on identical column names in pandas

阅读更多关于 Grouping on identical column names in pandas

问题 time A1 A1 A2 A2 A2 A3 A3 2017-01 a1 a2 b1 b2 c ..... 2017-02 a3 a4 b3 b4 c 2017-03 a5 a6 b5 b6 c .... There is a dataframe as shown above. How to get mean value of the columns which have the same name( as shown below)? time A1 A2 A3 2017-01 (a1+a2)/2 (b1+b2+c)/3 c 2017-02 ..... 2017-03 回答1: Use groupby with level=0 and axis=1 . df.groupby(level=0, axis=1).mean() np.random.seed(0) df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('AAABB')) df A A A B B 0 5 0 3 3 7 1 9 3 5 2 4 2 7 6

Group dataframe by multiple columns and append the result to the dataframe

阅读更多关于 Group dataframe by multiple columns and append the result to the dataframe

问题 This is similar to Attach a calculated column to an existing dataframe, however, that solution doesn't work when grouping by more than one column in pandas v0.14. For example: $ df = pd.DataFrame([ [1, 1, 1], [1, 2, 1], [1, 2, 2], [1, 3, 1], [2, 1, 1]], columns=['id', 'country', 'source']) The following calculation works: $ df.groupby(['id','country'])['source'].apply(lambda x: x.unique().tolist()) 0 [1] 1 [1, 2] 2 [1, 2] 3 [1] 4 [1] Name: source, dtype: object But assigning the output to a

Pandas: combining results from function on subset of dataframe with the original dataframe

阅读更多关于 Pandas: combining results from function on subset of dataframe with the original dataframe

问题 I am new to Pandas so please forgive me inexperience. Nonetheless I have worked on a lot of the parts of my question here. For simplicity let's take the example from the wiki article on Quantile Normalization: A 5 4 3 B 2 1 4 C 3 4 6 D 4 2 8 and update it to fit the data structure that I am dealing with: df = pd.DataFrame({ 'gene': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e', 'e', 'f', 'f', 'f'], 'rep': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3], 'val':

Histogram per hour - matplotlib

阅读更多关于 Histogram per hour - matplotlib

问题 I'm analyzing public data on transport accidents in the UK. My dataframe looks like this : Index Time 0 02:30 1 00:37 2 01:25 3 09:15 4 07:53 5 09:29 6 08:53 7 10:05 I'm trying to plot a histogram showing accident distribution by time of day, here is my code : import matplotlib import matplotlib.pyplot as plt import numpy as np import datetime as dt import matplotlib.dates as mdates df['hour']=pd.to_datetime(df['Time'],format='%H:%M') df.set_index('hour', drop=False, inplace=True) df['hour']

Pandas: GroupBy to DataFrame

阅读更多关于 Pandas: GroupBy to DataFrame

问题 There is a very popular S.O. question regarding groupby to dataframe see here. Unfortunately, I do not think this particular use case is the most useful. Suppose you have what could be a hierarchical dataset in a flattened form: e.g. key val 0 'a' 2 1 'a' 1 2 'b' 3 3 'b' 4 what I wish to do is convert that dataframe to this structure 'a' 'b' 0 2 3 1 1 4 I thought this would be as simple as pd.DataFrame(df.groupby('key').groups) but it is not. So how can I make this transformation? 回答1: df

How to use pandas Grouper with 7d frequency and fill missing days with 0?

阅读更多关于 How to use pandas Grouper with 7d frequency and fill missing days with 0?

问题 I have the following sample dataset df = pd.DataFrame({ 'names': ['joe', 'joe', 'joe'], 'dates': [dt.datetime(2019,6,1), dt.datetime(2019,6,5), dt.datetime(2019,7,1)], 'values': [5,2,13] }) and I want to group by names and by weeks or 7 days, which I can achieve with df_grouped = df.groupby(['names', pd.Grouper(key='dates', freq='7d')]).sum() values names dates joe 2019-06-01 7 2019-06-29 13 But what I would be looking for is something like this, with all the explicit dates values names dates

pandas create boolean column using groupby transform

阅读更多关于 pandas create boolean column using groupby transform

问题 I am trying to create a boolean column using GroupBy.transform on a df like this, id type 1 1.00000 1 1.00000 2 2.00000 2 3.00000 3 2.00000 the code is like, df['has_two'] = df.groupby('id')['type'].transform(lambda x: x == 2) but instead of boolean values, has_two has float values, e.g. 0.0 . I am wondering why is that. UPDATE I created a test case, df = pd.DataFrame({'id':['1', '1', '2', '2', '3'], 'type':[1.0, 1.0, 2.0, 1.0, 2.0]}) df['has_2'] = df.groupby('id')['type'].transform(lambda x:

How to use pandas Grouper on multiple keys?

阅读更多关于 How to use pandas Grouper on multiple keys?

问题 I need to groupby-transform a dataframe by a datetime column AND another str(object) column to apply a function by group and asign the result to each of the row members of the group. I understand the groupby workflow but cannot make a pandas.Grouper for both conditions at the same time. Thus: How to use pandas.Grouper on multiple columns? 回答1: Use the DataFrame.groupby with a list of pandas.Grouper as the by argument like this: df['result'] = df.groupby([ pd.Grouper('dt', freq='D'), pd

Transform wide to long but with repetition of a specific column

阅读更多关于 Transform wide to long but with repetition of a specific column

问题 I have a dataframe as shown below df2 = pd.DataFrame({'pid':[1,2,3,4],'BP1Date':['12/11/2016','12/21/2016','12/31/2026',np.nan],'BP1di':[21,24,25,np.nan],'BP1sy':[123,125,127,np.nan],'BP2Date':['12/31/2016','12/31/2016','12/31/2016','12/31/2016'],'BP2di':[21,26,28,30],'BP2sy':[123,130,135,145], 'BP3Date':['12/31/2017','12/31/2018','12/31/2019','12/31/2116'],'BP3di':[21,31,36,np.nan],'BP3sy':[123,126,145,np.nan]}) It looks like as shown below I expect my output to be like as shown below This

AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method

阅读更多关于 AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method

问题 I am very new to pandas and trying to use groupby. I have a df with multiple columns. I want to groupby a particular column and then sort each group based on a different column. I am getting the following error AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method . Any help would be much appreciated! Thanks! col1 | col2 | col3 | col4 | col5 ================================= A | A1 | A2 | A3 | DATE1 A | B1 | B2 | B3 | DATE2