问题
I have a DataFrame:
import pandas as pd
import numpy as np
x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']}
df = pd.DataFrame(x)
I want to replace the values starting with XXX with np.nan using lambda.
I have tried many things with replace, apply and map and the best I have been able to do is False, True, True, False.
The below works, but I would like to know a better way to do it and I think the apply, replace and a lambda is probably a better way to do it.
df.Value.loc[df.Value.str.startswith('XXX', na=False)] = np.nan
回答1:
use the apply method
In [80]: x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']}
In [81]: df = pd.DataFrame(x)
In [82]: df.Value.apply(lambda x: np.nan if x.startswith('XXX') else x)
Out[82]:
0 Test
1 NaN
2 NaN
3 Test
Name: Value, dtype: object
Performance Comparision of apply, where, loc
回答2:
np.where()
performs way better here:
df.Value=np.where(df.Value.str.startswith('XXX'),np.nan,df.Value)
Performance vs apply on larger dfs:
回答3:
Use of .loc
is not necessary. Write just:
df.Value[df.Value.str.startswith('XXX')] = np.nan
Lambda function could be necessary if you wanted to compute some
expression to be substituted. In this case just np.nan
is enough.
来源:https://stackoverflow.com/questions/57614324/replace-values-in-dataframe-column-when-they-start-with-string-using-lambda