pandas-groupby | 易学教程

subtract 1 from next cumsum if current cumsum more than a particular value - pandas or numpy

阅读更多关于 subtract 1 from next cumsum if current cumsum more than a particular value - pandas or numpy

问题 I have a data frame as shown below B_ID Session no_show cumulative_no_show 1 s1 0.4 0.4 2 s1 0.6 1.0 3 s1 0.2 1.2 4 s1 0.1 1.3 5 s1 0.4 1.7 6 s1 0.2 1.9 7 s1 0.3 2.2 10 s2 0.3 0.3 11 s2 0.4 0.7 12 s2 0.3 1.0 13 s2 0.6 1.6 14 s2 0.2 1.8 15 s2 0.5 2.3 where cumulative_no_show is the cumulative sum of no_show. From the above I would like to create a new column called u_no_show based on below condition. Whenever cumulative_no_show >= 0.8, then subtract 1 from next cumulative_no_show. and so on.

subtract 1 from next cumsum if current cumsum more than a particular value - pandas or numpy

阅读更多关于 subtract 1 from next cumsum if current cumsum more than a particular value - pandas or numpy

pandas filtering using isin function

阅读更多关于 pandas filtering using isin function

问题 I have two dataframe as shown below df1: ID Name 1 Sachin 2 Kholi 3 Dravid df2: ID Run 1 20 2 60 2 10 1 5 From the above I want to filter df1 by only taking unique ids in df2: Expected output: ID Name 3 Dravid I tried below code def diff(first, second): second = set(second) units_in_unit_table = [item for item in first if item not in second] return units_in_unit_table id_df2 = diff(df2, df1) df3 = df1[df1['ID'].isin(id_df2)] 回答1: It seems your solution should be simplify by pass unique values

pandas filtering using isin function

阅读更多关于 pandas filtering using isin function

pandas filtering using isin function

阅读更多关于 pandas filtering using isin function

Pandas P&L rollup to the next business day

阅读更多关于 Pandas P&L rollup to the next business day

问题 I'm having a hard time trying to do this efficiently. I have some stocks and daily P&L info in a dataframe. In reality, I have millions of rows of data so efficiency matters a lot! The Dataframe looks like : ------------------------------- | Date | Security | P&L | ------------------------------- | 2016-01-01 | AAPL | 100 | ------------------------------- | 2016-01-02 | AAPL | 200 | ------------------------------- | 2016-01-03 | AAPL | 300 | ------------------------------- | 2016-01-04 | AAPL

Take difference between pivot table columns in Python

阅读更多关于 Take difference between pivot table columns in Python

问题 I have a dataframe with a Week , Campaign , Placement and Count column. In order to compare counts per weeks by Campaign and Placement I created a pivot table that works great. How do I create a new column with the difference between these 2 weeks (in percentage if possible)? Code: dfPivot = pd.pivot_table(dfPivot, values='Count',\ index=['Campaign', 'Placement'],columns=['Week'], aggfunc=np.sum) Current Output: Week 2019-10-27 2019-11-03 Campaign Placement Code A 111111111 4288.0 615.0

Take difference between pivot table columns in Python

阅读更多关于 Take difference between pivot table columns in Python

Find duplicate rows among different groups with pandas

阅读更多关于 Find duplicate rows among different groups with pandas

问题 Problem Consider the following dataframe: data_so = { 'ID': [100, 100, 100, 200, 200, 300, 300, 300], 'letter': ['A','B','A','C','D','E','D','A'], } df_so = pandas.DataFrame (data_so, columns = ['ID', 'letter']) I want to obtain a new column where all duplicates in different groups are True. All other duplicates in the same group should be False. What I've tried I've tried using df_so['dup'] = df_so.duplicated(subset=['letter'], keep=False) but the result is not what I want: The first

function returning pandas dataframe

阅读更多关于 function returning pandas dataframe

问题 I was not clear about my issue, so I am reviewing the question. I have a function manipulating a generic dataframe (it removes and renames columns and records): def manipulate_df(df_local): df_local.rename(columns={'A': 'grouping_column'}, inplace = True) df_local.drop('B', axis=1, inplace=True) df_local.drop(df.query('grouping_column not in (\'1\', \'0\')').index, inplace = True) df_local = df_local.groupby(['grouping_column'])['C'].sum().to_frame().reset_index().copy() print("this is what I