pandas-groupby | 易学教程

Get the row corresponding to the max in pandas GroupBy

阅读更多关于 Get the row corresponding to the max in pandas GroupBy

问题 Simple DataFrame: df = pd.DataFrame({'A': [1,1,2,2], 'B': [0,1,2,3], 'C': ['a','b','c','d']}) df A B C 0 1 0 a 1 1 1 b 2 2 2 c 3 2 3 d I wish for every value ( groupby ) of column A, to get the value of column C, for which column B is maximum. For example for group 1 of column A, the maximum of column B is 1, so I want the value "b" of column C: A C 0 1 b 1 2 d No need to assume column B is sorted, performance is of top priority, then elegance. 回答1: Check with sort_values + drop_duplicates df

Add GroupBy mean result as a new column in pandas

阅读更多关于 Add GroupBy mean result as a new column in pandas

问题 I have a dataframe that gives upper and lower values of each indicator as follows df = pd.DataFrame( {'indicator': ['indicator 1', 'indicator 1', 'indicator 2', 'indicator 2'], 'year':[2014,2014,2015,2015], 'value type': ['upper', 'lower', 'upper', 'lower'], 'value':[12.3, 10.2, 15.4, 13.2] }, index=[1,2,3,4]) I want to remove the upper and lower values and replace that with the mean of two values. How can I do that? 回答1: You could groupby and transform by mean . df['value'] = df.groupby(

Fill the NA value in one column according to values of similar columns

阅读更多关于 Fill the NA value in one column according to values of similar columns

问题 I want to fill the value of the nan in the given value as following: df = pd.DataFrame({'A' : ['aa', 'bb', 'cc', 'aa'], 'B': ['xx', 'yy', 'zz','xx'], 'C': ['2', '3','8', np.nan]}) print (df) A B C aa xx 2 bb yy 3 cc zz 8 aa xx NaN Expected Output: A B C aa xx 2 bb yy 3 cc zz 8 aa xx 2 Since column A and B have value 2 in the third column, therefore last row should also have 2 in the C column. 回答1: Use GroupBy.ffill with DataFrame.sort_values and DataFrame.sort_index for NaN s to end of groups

How to pivot a dataframe

阅读更多关于 How to pivot a dataframe

问题 What is pivot? How do I pivot? Is this a pivot? Long format to wide format? I've seen a lot of questions that ask about pivot tables. Even if they don't know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting.... ... But I'm going to give it a go. The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble

How to pivot a dataframe

阅读更多关于 How to pivot a dataframe

Groupby cumulative sum in pandas based on specific condition

阅读更多关于 Groupby cumulative sum in pandas based on specific condition

问题 I have a data frame as shown below. B_ID No_Show Session slot_num Patient_count 1 0.4 S1 1 1 2 0.3 S1 2 1 3 0.8 S1 3 1 4 0.3 S1 3 2 5 0.6 S1 4 1 6 0.8 S1 5 1 7 0.9 S1 5 2 8 0.4 S1 5 3 9 0.6 S1 5 4 12 0.9 S2 1 1 13 0.5 S2 1 2 14 0.3 S2 2 1 15 0.7 S2 3 1 20 0.7 S2 4 1 16 0.6 S2 5 1 17 0.8 S2 5 2 19 0.3 S2 5 3 From the above I would like to find the cumulative No_show by Session df['Cum_No_show'] = df.groupby(['Session'])['No_Show'].cumsum() No we get B_ID No_Show Session slot_num Patient_count

pandas get minimum of one column in group when groupby another

阅读更多关于 pandas get minimum of one column in group when groupby another

问题 I have a pandas dataframe that looks like this: c y 0 9 0 1 8 0 2 3 1 3 6 2 4 1 3 5 2 3 6 5 3 7 4 4 8 0 4 9 7 4 I'd like to groupby y and get the min and max of c so that my new dataframe would look like this: c y min max 0 9 0 8 9 1 8 0 8 9 2 3 1 3 3 3 6 2 6 6 4 1 3 1 5 5 2 3 1 5 6 5 3 1 5 7 4 4 0 7 8 0 4 0 7 9 7 4 0 7 I tried using df['min'] = df.groupby(['y'])['c'].min() but that gave me some weird results. The first 175 rows were populated in the min column but then it went to NaN for all

Pandas row filters and and division from specific rows and columns

阅读更多关于 Pandas row filters and and division from specific rows and columns

问题 I have the following dataframe:- traffic_type date region total_views desktop 01/04/2018 aug 50 mobileweb 01/04/2018 aug 60 total 01/04/2018 aug 100 desktop 01/04/2018 world 20 mobileweb 01/04/2018 world 30 total 01/04/2018 world 40 I need to group by traffic_type, date, region, and filter the rows with traffic type total and in the same row create a desktop_share column which is total_views of traffic_type==desktop / total views of the traffic_type ==total the rest of the rows are blank for

Pandas - very slow performance when using stack(), groupby() and apply()

阅读更多关于 Pandas - very slow performance when using stack(), groupby() and apply()

问题 I am having a very slow performance when calling stack, groupby and apply for a large dataframe in Pandas (1498829 rows). The code gives the differences of pairs (with this I mean the difference of xx's for all i2 at every i1). The part of the code that is running slow is: def get_diff(x): teams = x.index.get_level_values(1) tmp = pd.DataFrame(x[:,None]-x[None,:], columns = teams.values, index=teams.values).stack() return tmp[tmp.index.get_level_values(0)!=tmp.index.get_level_values(1)] new

Pandas - very slow performance when using stack(), groupby() and apply()

阅读更多关于 Pandas - very slow performance when using stack(), groupby() and apply()