pandas-groupby | 易学教程

Understanding the execution of DataFrame in python

阅读更多关于 Understanding the execution of DataFrame in python

问题 I am new to python and i want to understand how the execution takes place in a DataFrame. let's try this with an example from the dataset found in the kaggle.com( Titanic: Machine Learning from Disaster ). I wanted to replace the NaN value with the mean() for the respective sex . ie. the NaN value for Men should be replaced by the mean of the mens age and vice versa. now i achieved this by using this line of code _data['new_age']=_data['new_age'].fillna(_data.groupby('Sex')['Age'].transform(

Get unique values of multiple columns as a new dataframe in pandas

阅读更多关于 Get unique values of multiple columns as a new dataframe in pandas

问题 Having pandas data frame df with at least columns C1,C2,C3 how would you get all the unique C1,C2,C3 values as a new DataFrame? in other words, similiar to : SELECT C1,C2,C3 FROM T GROUP BY C1,C2,C3 Tried that print df.groupby(by=['C1','C2','C3']) but im getting <pandas.core.groupby.DataFrameGroupBy object at 0x000000000769A9E8> 回答1: I believe you need drop_duplicates if want all unique triples: df = df.drop_duplicates(subset=['C1','C2','C3']) If want use groupby add first: df = df.groupby(by

How to group and sum some results in others ( Style format in Euros)

阅读更多关于 How to group and sum some results in others ( Style format in Euros)

问题 I want to make a pie chart over Europe and some specific countries, i need to groupe and sum some countries or companies in a group call "Others", for example: all the companies that have the budget less than 10000 euros. import pandas as pd from pandas import Series, DataFrame import numpy as np import matplotlib.pyplot as plt Year Project Entity Participation Country Budget 0 2015 671650 - MMMAGIC - 5G FUNDACION IMDEA NETWORK* Participant Spain € 304,000 1 2015 671650 - MMMAGIC - 5G ROHDE &

Groupby count based on year and specific condition

阅读更多关于 Groupby count based on year and specific condition

问题 I have a dataframe as shown below Tenancy_ID Unit_ID Tenancy_End_Date 1 A 2012-09-06 11:34:15 2 B 2013-09-08 10:35:18 3 A 2014-09-06 11:34:15 4 C 2014-09-06 11:34:15 5 B 2015-09-06 11:34:15 6 A 2014-09-06 11:34:15 5 A 2015-09-06 11:34:15 7 A 2019-09-06 11:34:15 4 C 2014-01-06 11:34:15 5 C 2014-05-06 11:34:15 From the above I would like to generate below dataframe Expected Output: Unit_ID NoC_2012 NoC_2013 NoC_2014 NoC_2015 NoC_2016 NoC_2017 NoC_2018 NoC_2019 A 1 0 2 1 0 0 0 1 B 0 1 0 1 0 0 0

How to generate unique id and sub_id for each group

阅读更多关于 How to generate unique id and sub_id for each group

问题 My goal is to generate an id (id trajectory) and a sub id (under trajectory) for each group (u_uuid and p_uuid). I tried the ngroup function and it didn't work data = [ {'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'walk', 'dest': 'work'}, {'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'walk', 'dest': 'work'}, {'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'bus', 'dest': 'work'}, {'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'bus', 'dest': 'work'}, {'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'walk', 'dest': 'work'},

Pandas simple correlation of two grouped DataFrame columns

阅读更多关于 Pandas simple correlation of two grouped DataFrame columns

问题 Is there a good way to get the simple correlation of two grouped DataFrame columns? It seems like no matter what the pandas .corr() functions want to return a correlation matrix. E.g., i = pd.MultiIndex.from_product([['A','B','C'], np.arange(1, 11, 1)], names=['Name','Num']) test = pd.DataFrame(np.random.randn(30, 2), i, columns=['X', 'Y']) test.groupby(['Name'])['X','Y'].corr() returns X Y Name A X 1.000000 0.152663 Y 0.152663 1.000000 B X 1.000000 -0.155113 Y -0.155113 1.000000 C X 1.000000

Pandas grouping and resampling for a bar plot:

阅读更多关于 Pandas grouping and resampling for a bar plot:

问题 I have a dataframe that records concentrations for several different locations in different years, with a high temporal frequency (<1 hour). I am trying to make a bar/multibar plot showing mean concentrations, at different locations in different years To calculate mean concentration, I have to apply quality control filters to daily and monthly data. My approach is to first apply filters and resample per year and then do the grouping by location and year. Also, out of all the locations (in the

Pandas grouping and resampling for a bar plot:

阅读更多关于 Pandas grouping and resampling for a bar plot:

python: pandas: how to find max value in a column based on groupby another column

阅读更多关于 python: pandas: how to find max value in a column based on groupby another column

问题 I want to group my dataframe based on one column SERVER and than find max value in other column JOB_ID. DF: SERVER JOB_ID LOG_FILE TIME 0 abc_123 1 1/abc_123/dep2/1/123.log 2019-12-05T05:06:16.346Z 1 abc_123 10 1/abc_123/dep2/10/123.log 2019-12-04T17:05:28.335Z 2 abc_123 11 1/abc_123/dep2/11/123.log 2019-12-04T20:27:03.988Z 3 abc_123 12 1/abc_123/dep2/12/123.log 2019-12-04T20:35:49.039Z 4 abc_123 13 1/abc_123/dep2/13/123.log 2019-12-04T20:42:36.890Z 5 abc_123 14 1/abc_123/dep2/14/123.log 2019

python: pandas: how to find max value in a column based on groupby another column

阅读更多关于 python: pandas: how to find max value in a column based on groupby another column