pandas-groupby | 易学教程

how to use columns values to groupby

阅读更多关于 how to use columns values to groupby

问题 I need to get the top1 and top2 rating watched by 'ma' and 'young'. here I only need to specifically define my value but not column usinga group by. data: gender age rating ma young PG fe young PG ma adult PG fe adult PG ma young PG fe young PG ma adult R fe adult R ma young R fe young R code : top1 = df.groupby(['ma','young']])['rating'].apply(lambda x: x.value_counts().index[0]) top2 = df.groupby(['ma','young']])['rating'].apply(lambda x: x.value_counts().index[1]) Please let me know how do

Elegant way to fill in a column with row values based on groups in pandas

阅读更多关于 Elegant way to fill in a column with row values based on groups in pandas

问题 I have a dataframe as given below data_file= pd.DataFrame({'person_id':[1,1,1,1,2,2,2,3,3,3],'ob.date': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], 'observation': ['Age','interviewdate','marital_status','interviewdate','Age','interviewdate','marital_status','Age','interviewdate','marital_status'], 'answer': [21,'21/08/2017','Single','22/05/2217', 26,'11/03/2010','Single',41,'31/09/2012','Married'] }) What I would like to do is, fetch the date values from answer

Python, count frequency of occurrence for value in another column

阅读更多关于 Python, count frequency of occurrence for value in another column

问题 So I've been scouring stackoverflow for solutions to similar problems and keep hitting walls. I am new to python and using pandas/python for ETL so forgive me if I am not describing my situation adequately. I have two dataframes df1 looks like: Subscriber Key OtherID AnotherID 1 'abc' '12' '23' 2 'bcd' '45' '56' 3 'abc' '12' '23' 4 'abc' '12' '23' 5 'cde' '78' '90' 6 'bcd' '45' '56' df2 looks like: Subscriber Key OtherID AnotherID 1 'abc' '12' '23' 2 'bcd' '45' '56' 3 'cde' '78' '90' I am

Python PANDAS: Resampling Multivariate Time Series with a Groupby

阅读更多关于 Python PANDAS: Resampling Multivariate Time Series with a Groupby

问题 I have data in the following general format that I would like to resample to 30 day time series windows: 'customer_id','transaction_dt','product','price','units' 1,2004-01-02,thing1,25,47 1,2004-01-17,thing2,150,8 2,2004-01-29,thing2,150,25 3,2017-07-15,thing3,55,17 3,2016-05-12,thing3,55,47 4,2012-02-23,thing2,150,22 4,2009-10-10,thing1,25,12 4,2014-04-04,thing2,150,2 5,2008-07-09,thing2,150,43 I would like the 30 day windows to start on 2014-01-01 and end on 12-31-2018. It is NOT guaranteed

Plot the result of a groupby operation in pandas

阅读更多关于 Plot the result of a groupby operation in pandas

问题 I have this sample table: ID Date Days Volume/Day 0 111 2016-01-01 20 50 1 111 2016-02-01 25 40 2 111 2016-03-01 31 35 3 111 2016-04-01 30 30 4 111 2016-05-01 31 25 5 111 2016-06-01 30 20 6 111 2016-07-01 31 20 7 111 2016-08-01 31 15 8 111 2016-09-01 29 15 9 111 2016-10-01 31 10 10 111 2016-11-01 29 5 11 111 2016-12-01 27 0 0 112 2016-01-01 31 55 1 112 2016-02-01 26 45 2 112 2016-03-01 31 40 3 112 2016-04-01 30 35 4 112 2016-04-01 31 30 5 112 2016-05-01 30 25 6 112 2016-06-01 31 25 7 112 2016

Mean of a grouped-by pandas dataframe

阅读更多关于 Mean of a grouped-by pandas dataframe

问题 I need to calculate the mean per day of the colums duration and km for the rows with value ==1 and values = 0. df Out[20]: Date duration km value 0 2015-03-28 09:07:00.800001 0 0 0 1 2015-03-28 09:36:01.819998 1 2 1 2 2015-03-30 09:36:06.839997 1 3 1 3 2015-03-30 09:37:27.659997 nan 5 0 4 2015-04-22 09:51:40.440003 3 7 0 5 2015-04-23 10:15:25.080002 0 nan 1 how can I modify this solution in order to have the means duration_value0, duration_value1, km_value0 and km_value1? df = df.set_index(

Difference between dates in Pandas dataframe

阅读更多关于 Difference between dates in Pandas dataframe

问题 This is related to this question, but now I need to find the difference between dates that are stored in 'YYYY-MM-DD'. Essentially the difference between values in the count column is what we need, but normalized by the number of days between each row. My dataframe is: date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0 2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0 2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812

Group duplicate columns and sum the corresponding column values using pandas [duplicate]

阅读更多关于 Group duplicate columns and sum the corresponding column values using pandas [duplicate]

问题 This question already has answers here : Pandas group-by and sum (6 answers) Closed last year . I am preprocessing apache server log data. I have 3 columns ID, TIME, and BYTES. Example: ID &nbsp &nbsp TIME &nbsp &nbsp BYTES 1 &nbsp &nbsp 13:00 &nbsp &nbsp 10 2 &nbsp &nbsp 13:02 &nbsp &nbsp 30 3 &nbsp &nbsp 13:03 &nbsp &nbsp 40 4 &nbsp &nbsp 13:02 &nbsp &nbsp 50 5 &nbsp &nbsp 13:03 &nbsp &nbsp 70 I want to achieve something like this: ID &nbsp &nbsp TIME &nbsp &nbsp BYTES 1 &nbsp &nbsp 13:00

Using groupby in Pandas to get the top 3 rows by column value

阅读更多关于 Using groupby in Pandas to get the top 3 rows by column value

问题 I have this dataframe: person_code type growth size ... 0 . 231 32 0.54 32 1 . 233 43 0.12 333 2 . 432 32 0.44 21 3 . 431 56 0.32 23 4 . 654 89 0.12 89 5 . 764 32 0.20 211 6 . 434 32 0.82 90 ... (This dataframe is pretty big, I made a simplification here) I want to create one dataframe for each type with the 3 persons with higher "growth", ordered by it. I want to be able to call it by type. In this case, let's use the type 32, so the output df should look something like this: person_code

Splitting groupby() in pandas into smaller groups and combining them

阅读更多关于 Splitting groupby() in pandas into smaller groups and combining them

问题 city temperature windspeed event day 2017-01-01 new york 32 6 Rain 2017-01-02 new york 36 7 Sunny 2017-01-03 new york 28 12 Snow 2017-01-04 new york 33 7 Sunny 2017-01-05 new york 31 7 Rain 2017-01-06 new york 33 5 Sunny 2017-01-07 new york 27 12 Rain 2017-01-08 new york 23 7 Rain 2017-01-01 mumbai 90 5 Sunny 2017-01-02 mumbai 85 12 Fog 2017-01-03 mumbai 87 15 Fog 2017-01-04 mumbai 92 5 Rain 2017-01-05 mumbai 89 7 Sunny 2017-01-06 mumbai 80 10 Fog 2017-01-07 mumbai 85 9 Sunny 2017-01-08