pandas-groupby

subtract 1 from next cumsum if current cumsum more than a particular value - pandas or numpy

给你一囗甜甜゛ 提交于 2021-02-11 05:07:38
问题 I have a data frame as shown below B_ID Session no_show cumulative_no_show 1 s1 0.4 0.4 2 s1 0.6 1.0 3 s1 0.2 1.2 4 s1 0.1 1.3 5 s1 0.4 1.7 6 s1 0.2 1.9 7 s1 0.3 2.2 10 s2 0.3 0.3 11 s2 0.4 0.7 12 s2 0.3 1.0 13 s2 0.6 1.6 14 s2 0.2 1.8 15 s2 0.5 2.3 where cumulative_no_show is the cumulative sum of no_show. From the above I would like to create a new column called u_no_show based on below condition. Whenever cumulative_no_show >= 0.8, then subtract 1 from next cumulative_no_show. and so on.

subtract 1 from next cumsum if current cumsum more than a particular value - pandas or numpy

梦想的初衷 提交于 2021-02-11 05:07:18
问题 I have a data frame as shown below B_ID Session no_show cumulative_no_show 1 s1 0.4 0.4 2 s1 0.6 1.0 3 s1 0.2 1.2 4 s1 0.1 1.3 5 s1 0.4 1.7 6 s1 0.2 1.9 7 s1 0.3 2.2 10 s2 0.3 0.3 11 s2 0.4 0.7 12 s2 0.3 1.0 13 s2 0.6 1.6 14 s2 0.2 1.8 15 s2 0.5 2.3 where cumulative_no_show is the cumulative sum of no_show. From the above I would like to create a new column called u_no_show based on below condition. Whenever cumulative_no_show >= 0.8, then subtract 1 from next cumulative_no_show. and so on.

pandas filtering using isin function

二次信任 提交于 2021-02-11 04:32:59
问题 I have two dataframe as shown below df1: ID Name 1 Sachin 2 Kholi 3 Dravid df2: ID Run 1 20 2 60 2 10 1 5 From the above I want to filter df1 by only taking unique ids in df2: Expected output: ID Name 3 Dravid I tried below code def diff(first, second): second = set(second) units_in_unit_table = [item for item in first if item not in second] return units_in_unit_table id_df2 = diff(df2, df1) df3 = df1[df1['ID'].isin(id_df2)] 回答1: It seems your solution should be simplify by pass unique values

pandas filtering using isin function

霸气de小男生 提交于 2021-02-11 04:32:22
问题 I have two dataframe as shown below df1: ID Name 1 Sachin 2 Kholi 3 Dravid df2: ID Run 1 20 2 60 2 10 1 5 From the above I want to filter df1 by only taking unique ids in df2: Expected output: ID Name 3 Dravid I tried below code def diff(first, second): second = set(second) units_in_unit_table = [item for item in first if item not in second] return units_in_unit_table id_df2 = diff(df2, df1) df3 = df1[df1['ID'].isin(id_df2)] 回答1: It seems your solution should be simplify by pass unique values

pandas filtering using isin function

时光怂恿深爱的人放手 提交于 2021-02-11 04:31:11
问题 I have two dataframe as shown below df1: ID Name 1 Sachin 2 Kholi 3 Dravid df2: ID Run 1 20 2 60 2 10 1 5 From the above I want to filter df1 by only taking unique ids in df2: Expected output: ID Name 3 Dravid I tried below code def diff(first, second): second = set(second) units_in_unit_table = [item for item in first if item not in second] return units_in_unit_table id_df2 = diff(df2, df1) df3 = df1[df1['ID'].isin(id_df2)] 回答1: It seems your solution should be simplify by pass unique values

Pandas P&L rollup to the next business day

烂漫一生 提交于 2021-02-10 17:50:29
问题 I'm having a hard time trying to do this efficiently. I have some stocks and daily P&L info in a dataframe. In reality, I have millions of rows of data so efficiency matters a lot! The Dataframe looks like : ------------------------------- | Date | Security | P&L | ------------------------------- | 2016-01-01 | AAPL | 100 | ------------------------------- | 2016-01-02 | AAPL | 200 | ------------------------------- | 2016-01-03 | AAPL | 300 | ------------------------------- | 2016-01-04 | AAPL

Take difference between pivot table columns in Python

自闭症网瘾萝莉.ら 提交于 2021-02-10 14:17:34
问题 I have a dataframe with a Week , Campaign , Placement and Count column. In order to compare counts per weeks by Campaign and Placement I created a pivot table that works great. How do I create a new column with the difference between these 2 weeks (in percentage if possible)? Code: dfPivot = pd.pivot_table(dfPivot, values='Count',\ index=['Campaign', 'Placement'],columns=['Week'], aggfunc=np.sum) Current Output: Week 2019-10-27 2019-11-03 Campaign Placement Code A 111111111 4288.0 615.0

Take difference between pivot table columns in Python

自闭症网瘾萝莉.ら 提交于 2021-02-10 14:15:20
问题 I have a dataframe with a Week , Campaign , Placement and Count column. In order to compare counts per weeks by Campaign and Placement I created a pivot table that works great. How do I create a new column with the difference between these 2 weeks (in percentage if possible)? Code: dfPivot = pd.pivot_table(dfPivot, values='Count',\ index=['Campaign', 'Placement'],columns=['Week'], aggfunc=np.sum) Current Output: Week 2019-10-27 2019-11-03 Campaign Placement Code A 111111111 4288.0 615.0

Find duplicate rows among different groups with pandas

南笙酒味 提交于 2021-02-10 12:55:45
问题 Problem Consider the following dataframe: data_so = { 'ID': [100, 100, 100, 200, 200, 300, 300, 300], 'letter': ['A','B','A','C','D','E','D','A'], } df_so = pandas.DataFrame (data_so, columns = ['ID', 'letter']) I want to obtain a new column where all duplicates in different groups are True. All other duplicates in the same group should be False. What I've tried I've tried using df_so['dup'] = df_so.duplicated(subset=['letter'], keep=False) but the result is not what I want: The first

function returning pandas dataframe

非 Y 不嫁゛ 提交于 2021-02-10 11:49:42
问题 I was not clear about my issue, so I am reviewing the question. I have a function manipulating a generic dataframe (it removes and renames columns and records): def manipulate_df(df_local): df_local.rename(columns={'A': 'grouping_column'}, inplace = True) df_local.drop('B', axis=1, inplace=True) df_local.drop(df.query('grouping_column not in (\'1\', \'0\')').index, inplace = True) df_local = df_local.groupby(['grouping_column'])['C'].sum().to_frame().reset_index().copy() print("this is what I