pandas-groupby

how to use columns values to groupby

ⅰ亾dé卋堺 提交于 2019-12-11 10:35:03
问题 I need to get the top1 and top2 rating watched by 'ma' and 'young'. here I only need to specifically define my value but not column usinga group by. data: gender age rating ma young PG fe young PG ma adult PG fe adult PG ma young PG fe young PG ma adult R fe adult R ma young R fe young R code : top1 = df.groupby(['ma','young']])['rating'].apply(lambda x: x.value_counts().index[0]) top2 = df.groupby(['ma','young']])['rating'].apply(lambda x: x.value_counts().index[1]) Please let me know how do

Elegant way to fill in a column with row values based on groups in pandas

安稳与你 提交于 2019-12-11 07:47:29
问题 I have a dataframe as given below data_file= pd.DataFrame({'person_id':[1,1,1,1,2,2,2,3,3,3],'ob.date': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], 'observation': ['Age','interviewdate','marital_status','interviewdate','Age','interviewdate','marital_status','Age','interviewdate','marital_status'], 'answer': [21,'21/08/2017','Single','22/05/2217', 26,'11/03/2010','Single',41,'31/09/2012','Married'] }) What I would like to do is, fetch the date values from answer

Python, count frequency of occurrence for value in another column

元气小坏坏 提交于 2019-12-11 07:16:57
问题 So I've been scouring stackoverflow for solutions to similar problems and keep hitting walls. I am new to python and using pandas/python for ETL so forgive me if I am not describing my situation adequately. I have two dataframes df1 looks like: Subscriber Key OtherID AnotherID 1 'abc' '12' '23' 2 'bcd' '45' '56' 3 'abc' '12' '23' 4 'abc' '12' '23' 5 'cde' '78' '90' 6 'bcd' '45' '56' df2 looks like: Subscriber Key OtherID AnotherID 1 'abc' '12' '23' 2 'bcd' '45' '56' 3 'cde' '78' '90' I am

Python PANDAS: Resampling Multivariate Time Series with a Groupby

冷暖自知 提交于 2019-12-11 07:06:37
问题 I have data in the following general format that I would like to resample to 30 day time series windows: 'customer_id','transaction_dt','product','price','units' 1,2004-01-02,thing1,25,47 1,2004-01-17,thing2,150,8 2,2004-01-29,thing2,150,25 3,2017-07-15,thing3,55,17 3,2016-05-12,thing3,55,47 4,2012-02-23,thing2,150,22 4,2009-10-10,thing1,25,12 4,2014-04-04,thing2,150,2 5,2008-07-09,thing2,150,43 I would like the 30 day windows to start on 2014-01-01 and end on 12-31-2018. It is NOT guaranteed

Plot the result of a groupby operation in pandas

偶尔善良 提交于 2019-12-11 07:01:40
问题 I have this sample table: ID Date Days Volume/Day 0 111 2016-01-01 20 50 1 111 2016-02-01 25 40 2 111 2016-03-01 31 35 3 111 2016-04-01 30 30 4 111 2016-05-01 31 25 5 111 2016-06-01 30 20 6 111 2016-07-01 31 20 7 111 2016-08-01 31 15 8 111 2016-09-01 29 15 9 111 2016-10-01 31 10 10 111 2016-11-01 29 5 11 111 2016-12-01 27 0 0 112 2016-01-01 31 55 1 112 2016-02-01 26 45 2 112 2016-03-01 31 40 3 112 2016-04-01 30 35 4 112 2016-04-01 31 30 5 112 2016-05-01 30 25 6 112 2016-06-01 31 25 7 112 2016

Mean of a grouped-by pandas dataframe

末鹿安然 提交于 2019-12-11 06:43:57
问题 I need to calculate the mean per day of the colums duration and km for the rows with value ==1 and values = 0. df Out[20]: Date duration km value 0 2015-03-28 09:07:00.800001 0 0 0 1 2015-03-28 09:36:01.819998 1 2 1 2 2015-03-30 09:36:06.839997 1 3 1 3 2015-03-30 09:37:27.659997 nan 5 0 4 2015-04-22 09:51:40.440003 3 7 0 5 2015-04-23 10:15:25.080002 0 nan 1 how can I modify this solution in order to have the means duration_value0, duration_value1, km_value0 and km_value1? df = df.set_index(

Difference between dates in Pandas dataframe

风流意气都作罢 提交于 2019-12-11 06:23:24
问题 This is related to this question, but now I need to find the difference between dates that are stored in 'YYYY-MM-DD'. Essentially the difference between values in the count column is what we need, but normalized by the number of days between each row. My dataframe is: date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0 2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0 2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812

Group duplicate columns and sum the corresponding column values using pandas [duplicate]

限于喜欢 提交于 2019-12-11 06:08:43
问题 This question already has answers here : Pandas group-by and sum (6 answers) Closed last year . I am preprocessing apache server log data. I have 3 columns ID, TIME, and BYTES. Example: ID &nbsp &nbsp TIME &nbsp &nbsp BYTES 1 &nbsp &nbsp 13:00 &nbsp &nbsp 10 2 &nbsp &nbsp 13:02 &nbsp &nbsp 30 3 &nbsp &nbsp 13:03 &nbsp &nbsp 40 4 &nbsp &nbsp 13:02 &nbsp &nbsp 50 5 &nbsp &nbsp 13:03 &nbsp &nbsp 70 I want to achieve something like this: ID &nbsp &nbsp TIME &nbsp &nbsp BYTES 1 &nbsp &nbsp 13:00

Using groupby in Pandas to get the top 3 rows by column value

梦想的初衷 提交于 2019-12-11 06:06:19
问题 I have this dataframe: person_code type growth size ... 0 . 231 32 0.54 32 1 . 233 43 0.12 333 2 . 432 32 0.44 21 3 . 431 56 0.32 23 4 . 654 89 0.12 89 5 . 764 32 0.20 211 6 . 434 32 0.82 90 ... (This dataframe is pretty big, I made a simplification here) I want to create one dataframe for each type with the 3 persons with higher "growth", ordered by it. I want to be able to call it by type. In this case, let's use the type 32, so the output df should look something like this: person_code

Splitting groupby() in pandas into smaller groups and combining them

雨燕双飞 提交于 2019-12-11 05:59:48
问题 city temperature windspeed event day 2017-01-01 new york 32 6 Rain 2017-01-02 new york 36 7 Sunny 2017-01-03 new york 28 12 Snow 2017-01-04 new york 33 7 Sunny 2017-01-05 new york 31 7 Rain 2017-01-06 new york 33 5 Sunny 2017-01-07 new york 27 12 Rain 2017-01-08 new york 23 7 Rain 2017-01-01 mumbai 90 5 Sunny 2017-01-02 mumbai 85 12 Fog 2017-01-03 mumbai 87 15 Fog 2017-01-04 mumbai 92 5 Rain 2017-01-05 mumbai 89 7 Sunny 2017-01-06 mumbai 80 10 Fog 2017-01-07 mumbai 85 9 Sunny 2017-01-08