pandas-groupby | 易学教程

Exclude a specific date based on a condition using pandas

阅读更多关于 Exclude a specific date based on a condition using pandas

问题 df2 = pd.DataFrame({'person_id':[11,11,11,11,11,12,12,13,13,14,14,14,14], 'admit_date':['01/01/2011','01/01/2009','12/31/2013','12/31/2017','04/03/2014','08/04/2016', '03/05/2014','02/07/2011','08/08/2016','12/31/2017','05/01/2011','05/21/2014','07/12/2016']}) df2 = df2.melt('person_id', value_name='dates') df2['dates'] = pd.to_datetime(df2['dates']) What I would like to do is a) Exclude/filter out records from the data frame if a subject has Dec 31st and Jan 1st in its records. Please note

How to pivot a dataframe

阅读更多关于 How to pivot a dataframe

问题 What is pivot? How do I pivot? Is this a pivot? Long format to wide format? I've seen a lot of questions that ask about pivot tables. Even if they don't know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting.... ... But I'm going to give it a go. The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble

Pandas: Group by a column that meets a condition

阅读更多关于 Pandas: Group by a column that meets a condition

问题 I have a data set with three colums: rating , breed, and dog. import pandas as pd dogs = {'breed': ['Chihuahua', 'Chihuahua', 'Dalmatian', 'Sphynx'], 'dog': [True, True, True, False], 'rating': [8.0, 9.0, 10.0, 7.0]} df = pd.DataFrame(data=dogs) I would like to calculate the mean rating per breed where dog is True. This would be the expected: breed rating 0 Chihuahua 8.5 1 Dalmatian 10.0 This has been my attempt: df.groupby('breed')['rating'].mean().where(dog == True) And this is the error

How to get the first group in a groupby of multiple columns?

阅读更多关于 How to get the first group in a groupby of multiple columns?

问题 I've been trying to figure out how I can return just the first group, after I apply groupby. My code looks like this: gb = df.groupby(['col1', 'col2', 'col3', 'col4'])['col5'].sum() What I want is for that first first group to output. I've been trying the get_group method but it keeps failing (maybe because I am grouping by multiple columns?) Here is an example of my output: col1 col2 col3 col4 'sum' 1 34 green 10 0.0 yellow 30 1.5 orange 20 1.1 2 89 green 10 3.0 yellow 5 0.0 orange 10 1.0

create a function to create new rows in data frames based on the given parameters as list in pandas

阅读更多关于 create a function to create new rows in data frames based on the given parameters as list in pandas

问题 I have a data frame as shown below. where the data always will have one session. That means number of unique value in a column 'Session' will be one always. df: B_ID No_Show Session slot_num Cumulative_no_show 1 0.4 S1 1 0.4 2 0.3 S1 2 0.7 3 0.8 S1 3 1.5 4 0.3 S1 4 1.8 5 0.6 S1 5 2.4 6 0.8 S1 6 3.2 7 0.9 S1 7 4.1 8 0.4 S1 8 4.5 9 0.6 S1 9 5.1 I tried below code to create above df. df = pd.DataFrame({'B_ID': [1, 2, 3, 4, 5, 6, 7, 8, 9], 'No_Show': [0.4, 0.3, 0.8, 0.3, 0.6, 0.8, 0.9, 0.4, 0.6],

how to crate the group by in pandas only in one level

阅读更多关于 how to crate the group by in pandas only in one level

问题 I am importing below df3 dataframe in my excel file and want to grouby only Name and rest dublicate data should reflect as below . Note (Each Month data will be added as per month wise. ) Df3 =pd.read_Excel('Data') print (df3) Name ID Month Shift Jon 1 Feb A Jon 1 Jan B Jon 1 Mar C Mike 1 Jan A Mike 1 Jan B Jon 1 Feb C Jon 1 Jan A and i want to have output like as below in the same formate . Please help me on same as im stuck here . Will be greatfull for help and support . 回答1: You can

how to crate the group by in pandas only in one level

阅读更多关于 how to crate the group by in pandas only in one level

pandas and groupby: how to calculate weighted averages within an agg

阅读更多关于 pandas and groupby: how to calculate weighted averages within an agg

问题 I calculate a number of aggregate functions using groupby and agg , because I need different aggregate functions for different variables, e.g. not the sum of all, but sum and mean of x, mean of y, etc. Is there a way to calculate a weighted average using agg? I have found lots of examples, but none with agg. I can calculate the weighted average manually, as in the code below (note the lines with **), but I was wondering if there is a more elegant and direct way? Can I create my own function

Fast, efficient pandas Groupby sum / mean without aggregation

阅读更多关于 Fast, efficient pandas Groupby sum / mean without aggregation

问题 It is easy and fast to perform grouping and aggregation in pandas . However, performing simple groupby-apply functions that pandas already has built in C without aggregation, at least in the way I do it, is far slower because of a lambda function. # Form data >>> import numpy as np >>> import pandas as pd >>> df = pd.DataFrame(np.random.random((100,3)),columns=['a','b','c']) >>> df['g'] = np.random.randint(0,3,100) >>> df.head() a b c g 0 0.901610 0.643869 0.094082 1 1 0.536437 0.836622 0

Get the row corresponding to the max in pandas GroupBy

阅读更多关于 Get the row corresponding to the max in pandas GroupBy

问题 Simple DataFrame: df = pd.DataFrame({'A': [1,1,2,2], 'B': [0,1,2,3], 'C': ['a','b','c','d']}) df A B C 0 1 0 a 1 1 1 b 2 2 2 c 3 2 3 d I wish for every value ( groupby ) of column A, to get the value of column C, for which column B is maximum. For example for group 1 of column A, the maximum of column B is 1, so I want the value "b" of column C: A C 0 1 b 1 2 d No need to assume column B is sorted, performance is of top priority, then elegance. 回答1: Check with sort_values + drop_duplicates df