Conditional data imputation in Python

孤者浪人 提交于 2020-05-17 06:24:48

问题


I am trying to impute values in my dataset conditionally.

Say I have three columns, If Column 1 is 1 then Column 2 is 0 and Column 3 is 0; If column 1 is 2 then Column 2 is Mean () and Column 3 is Mean().

I tried running an if statement with the function any() and defined the conditions separately.

However the conditions are not being fulfilled based on conditions, I am either getting all mean values or all zeroes.

The exact code goes as below:

if (df['Retention_Term'] == 6):
    df.cl_tot_calls_term_seq_1.replace(999, np.nan,inplace = True)
df['cl_tot_calls_term_seq_1'].fillna(df['cl_tot_calls_term_seq_1'].median(),inplace= True)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

回答1:


Try it like this.

mask1 = df['Retention_Term']==6
mask2 = df['cl_tot_calls_term_seq_1'] == 999 
df.loc[mask1 & mask2, 'cl_tot_calls_term_seq_1'] = np.nan

Then the rest should be ok.

df['cl_tot_calls_term_seq_1'].fillna(df['cl_tot_calls_term_seq_1'].median(), inplace= True)


来源:https://stackoverflow.com/questions/61713051/conditional-data-imputation-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!