pandas-groupby | 易学教程

Groupby column and find min and max of each group

阅读更多关于 Groupby column and find min and max of each group

问题 I have the following dataset, Day Element Data_Value 6786 01-01 TMAX 112 9333 01-01 TMAX 101 9330 01-01 TMIN 60 11049 01-01 TMIN 0 6834 01-01 TMIN 25 11862 01-01 TMAX 113 1781 01-01 TMAX 115 11042 01-01 TMAX 105 1110 01-01 TMAX 111 651 01-01 TMIN 44 11350 01-01 TMIN 83 1798 01-02 TMAX 70 4975 01-02 TMAX 79 12774 01-02 TMIN 0 3977 01-02 TMIN 60 2485 01-02 TMAX 73 4888 01-02 TMIN 31 11836 01-02 TMIN 26 11368 01-02 TMAX 71 2483 01-02 TMIN 26 I want to group by the Day and then find the overall

Pandas Dataframe: how to add column with number of occurrences in other column

阅读更多关于 Pandas Dataframe: how to add column with number of occurrences in other column

问题 I have to following df: Col1 Col2 test Something test2 Something test3 Something test Something test2 Something test5 Something I want to get Col1 Col2 Occur test Something 2 test2 Something 2 test3 Something 1 test Something 2 test2 Something 2 test5 Something 1 I've tried to use: df["Occur"] = df["Col1"].value_counts() But it didn't help. I've got Occur column full of 'NaN' 回答1: groupby on 'col1' and then apply transform on Col2 to return a Series with its index aligned to the original df

Sample each group after pandas groupby

阅读更多关于 Sample each group after pandas groupby

问题 I know this must have been answered some where but I just could not find it. Problem : Sample each group after groupby operation. import pandas as pd df = pd.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [1,1,1,0,0,0,0]}) grouped = df.groupby('b') # now sample from each group, e.g., I want 30% of each group 回答1: Apply a lambda and call sample with param frac : In [2]: df = pd.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [1,1,1,0,0,0,0]}) grouped = df.groupby('b') grouped.apply(lambda x: x.sample(frac=0.3)

Python (Pandas) Add subtotal on each lvl of multiindex dataframe

阅读更多关于 Python (Pandas) Add subtotal on each lvl of multiindex dataframe

问题 Assuming I have the following dataframe: a b c Sce1 Sce2 Sce3 Sce4 Sce5 Sc6 Animal Ground Dog 0.0 0.9 0.5 0.0 0.3 0.4 Animal Ground Cat 0.6 0.5 0.3 0.5 1.0 0.2 Animal Air Eagle 1.0 0.1 0.1 0.6 0.9 0.1 Animal Air Owl 0.3 0.1 0.5 0.3 0.5 0.9 Object Metal Car 0.3 0.3 0.8 0.6 0.5 0.6 Object Metal Bike 0.5 0.1 0.4 0.7 0.4 0.2 Object Wood Chair 0.9 0.6 0.1 0.9 0.2 0.8 Object Wood Table 0.9 0.6 0.6 0.1 0.9 0.7 I want to create a MultiIndex, which will contain the sum of each lvl. The output will

How to drop duplicates based on two or more subsets criteria in Pandas data-frame

阅读更多关于 How to drop duplicates based on two or more subsets criteria in Pandas data-frame

问题 Lets say this is my data-frame df = pd.DataFrame({ 'bio' : ['1', '1', '1', '4'], 'center' : ['one', 'one', 'two', 'three'], 'outcome' : ['f','t','f','f'] }) It looks like this ... bio center outcome 0 1 one f 1 1 one t 2 1 two f 3 4 three f I want to drop row 1 because it has the same bio & center as row 0. I want to keep row 2 because it has the same bio but different center then row 0. Something like this won't work based on drop_duplicates input structure but it's what I am trying to do df

Save grouped by results into separate CSV files

阅读更多关于 Save grouped by results into separate CSV files

问题 I have a code for create groups with CSV data and create new files with that groups too! I read my csv file and then work with that. The problem is when my funtion works and create the new files with the data, the name of the new files is the name of the group and I don´t want that: ID Inventory Domain Requests Impressions Fill Rate 123456 au_to/8 neighborhoodscout.com 11402 26 0.23 123456 au_to/8 sinembargo.mx 10334 24 0.23 123456 au_to/8 elsalvadortimes.com 9893 17 0.17 155444 cami_oneta/8

How to do group by on a multiindex in pandas?

阅读更多关于 How to do group by on a multiindex in pandas?

问题 Below is my dataframe. I made some transformations to create the category column and dropped the original column it was derived from. Now I need to do a group-by to remove the dups e.g. Love and Fashion can be rolled up via a groupby sum. df.colunms = array([category, clicks, revenue, date, impressions, size], dtype=object) df.values= [[Love 0 0.36823 2013-11-04 380 300x250] [Love 183 474.81522 2013-11-04 374242 300x250] [Fashion 0 0.19434 2013-11-04 197 300x250] [Fashion 9 18.26422 2013-11

pandas dataframe groupby datetime month

阅读更多关于 pandas dataframe groupby datetime month

问题 Consider a csv file: string,date,number a string,2/5/11 9:16am,1.0 a string,3/5/11 10:44pm,2.0 a string,4/22/11 12:07pm,3.0 a string,4/22/11 12:10pm,4.0 a string,4/29/11 11:59am,1.0 a string,5/2/11 1:41pm,2.0 a string,5/2/11 2:02pm,3.0 a string,5/2/11 2:56pm,4.0 a string,5/2/11 3:00pm,5.0 a string,5/2/14 3:02pm,6.0 a string,5/2/14 3:18pm,7.0 I can read this in, and reformat the date column into datetime format: b=pd.read_csv('b.dat') b['date']=pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p

pandas dataframe groupby datetime month

阅读更多关于 pandas dataframe groupby datetime month

Multiple aggregations of the same column using pandas GroupBy.agg()

阅读更多关于 Multiple aggregations of the same column using pandas GroupBy.agg()

问题 Is there a pandas built-in way to apply two different aggregating functions f1, f2 to the same column df["returns"] , without having to call agg() multiple times? Example dataframe: import pandas as pd import datetime as dt pd.np.random.seed(0) df = pd.DataFrame({ "date" : [dt.date(2012, x, 1) for x in range(1, 11)], "returns" : 0.05 * np.random.randn(10), "dummy" : np.repeat(1, 10) }) The syntactically wrong, but intuitively right, way to do it would be: # Assume `f1` and `f2` are defined