pandas-groupby | 易学教程

divide a column based on groupby or looping conditions in pandas

阅读更多关于 divide a column based on groupby or looping conditions in pandas

问题 I have a data frame as shown below B_ID No_Show Session slot_num Patient_count 1 0.2 S1 1 1 2 0.3 S1 2 1 3 0.8 S1 3 1 4 0.3 S1 3 2 5 0.6 S1 4 1 6 0.8 S1 5 1 7 0.9 S1 5 2 8 0.4 S1 5 3 9 0.6 S1 5 4 12 0.9 S2 1 1 13 0.5 S2 1 2 14 0.3 S2 2 1 15 0.7 S2 3 1 20 0.7 S2 4 1 16 0.6 S2 5 1 17 0.8 S2 5 2 19 0.3 S2 5 3 where No_Show = Probability of no show Assume that threshold probability = 0.2 Duration for each slot = 30 (minutes) From the above I would like calculate below data frame Step1 sort the

Include missing group keys as NaN in pandas GroupBy output

阅读更多关于 Include missing group keys as NaN in pandas GroupBy output

问题 I have a dataframe in pandas. test_df = pd.DataFrame({'date': ['2018-12-28', '2018-12-28', '2018-12-29', '2018-12-29', '2018-12-30', '2018-12-30'], 'transaction': ['aa', 'bb', 'cc', 'aa', 'bb', 'bb'], 'ccy': ['USD', 'EUR', 'EUR', 'USD', 'USD', 'USD'], 'amt': np.random.random(6)}) test_df: date transaction ccy amt 2018-12-28 aa USD 0.323439 2018-12-28 bb EUR 0.048948 2018-12-29 cc EUR 0.793263 2018-12-29 aa USD 0.013865 2018-12-30 bb USD 0.658571 2018-12-30 bb USD 0.224951 The following code

Include missing group keys as NaN in pandas GroupBy output

阅读更多关于 Include missing group keys as NaN in pandas GroupBy output

why np.std() and pivot_table(aggfunc=np.std) return the different result

阅读更多关于 why np.std() and pivot_table(aggfunc=np.std) return the different result

问题 I have some code and do not understand why the difference occurs: np.std() which default ddof=0,when it's used alone. but why when it's used as an argument in pivot_table(aggfunc=np.std),it changes into ddof=1 automatically. import numpys as np import pandas as pd dft = pd.DataFrame({'A': ['one', 'one'], 'B': ['A', 'A'], 'C': ['bar', 'bar'], 'D': [-0.866740402,1.490732028]}) np.std(dft['D']) #equivalent:np.std([-0.866740402,1.490732028]) (which:defaualt ddof=0) #the result: 1.178736215 dft

why np.std() and pivot_table(aggfunc=np.std) return the different result

阅读更多关于 why np.std() and pivot_table(aggfunc=np.std) return the different result

Dataframe cell to be locked and used for a running balance calculation (follow up question)

阅读更多关于 Dataframe cell to be locked and used for a running balance calculation (follow up question)

问题 (This is a follow up question to my previous question which was answered correctly). Say I have the following dataframe import pandas as pd df = pd.DataFrame() df['E'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL','SHODL','SCLOSE_BUY','BCLOSE_SELL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT') df['F'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 2.00,10.00,10.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00) df.loc[19, 'G'] = 100

python: use agg with more than one customized function

阅读更多关于 python: use agg with more than one customized function

问题 I have a data frame like this. mydf = pd.DataFrame({'a':[1,1,3,3],'b':[np.nan,2,3,6],'c':[1,3,3,9]}) a b c 0 1 NaN 1 1 1 2.0 3 2 3 3.0 3 3 3 6.0 9 I would like to have a resulting dataframe like this. myResults = pd.concat([mydf.groupby('a').apply(lambda x: (x.b/x.c).max()), mydf.groupby('a').apply(lambda x: (x.b/x.c).min())], axis =1) myResults.columns = ['max','min'] max min a 1 0.666667 0.666667 3 1.000000 0.666667 Basically i would like to have max and min of ratio of column b and column

Time difference in days based on specific condition in pandas

阅读更多关于 Time difference in days based on specific condition in pandas

问题 I have a data frame as shown below ID CONSTRUCTION_DATE START_DATE END_DATE CANCELLED_DATE 1 2016-02-06 2016-02-26 2017-02-26 NaT 1 2016-02-06 2017-03-27 2018-02-26 2017-05-22 1 2016-02-06 2017-08-27 2019-02-26 2017-10-21 1 2016-02-06 2018-07-27 2021-02-26 NaT 2 2016-05-06 2017-03-27 2018-02-26 NaT 2 2016-05-06 2018-08-27 2019-02-26 NaT Above data has to be order based on ID and START_DATE. From the above data frame I would like to prepare below dataframe ID D_from_C_to_first_S_D T_D_V_aft_c

Python 3 pandas.groupby.filter

阅读更多关于 Python 3 pandas.groupby.filter

问题 I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter >>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', ... 'foo', 'bar'], ... 'B' : [1, 2, 3, 4, 5, 6], ... 'C' : [2.0, 5., 8., 1., 2., 9.]}) >>> grouped = df.groupby('A') >>> grouped.filter(lambda x: x['B'].mean() > 3.) A B C 1 bar 2 5.0 3 bar 4 1.0 5 bar 6 9.0 I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum

Fill in missing dates of groupby

阅读更多关于 Fill in missing dates of groupby

问题 Imagine I have a dataframe that looks like: ID DATE VALUE 1 31-01-2006 5 1 28-02-2006 5 1 31-05-2006 10 1 30-06-2006 11 2 31-01-2006 5 2 31-02-2006 5 2 31-03-2006 5 2 31-04-2006 5 As you can see this is panel data with multiple entries on the same date for different IDs. What I want to do is fill in missing dates for each ID. You can see that for ID "1" there is a jump in months between the second and third entry. I would like a dataframe that looks like: ID DATE VALUE 1 31-01-2006 5 1 28-02