问题
I have the following code to replace text in my dataframe - dfMSR.
oldtxts = ['NA', 'na']
newtxt = 'N/A'
for oldtxt in oldtxts:
if oldtxt in dfMSR.values:
dfMSR = dfMSR.replace(oldtxt, newtxt, regex=True)
else:
print("\nNo {oldtxt} in Dataframe")
Is there a better way to replace all cases scenarios without spelling them out or changing the case of all text in the dataframe to upper or lower? In the above code if the user wrote 'Na', it wouldn't be replaced as I haven't included it in oldtxts.
edit: sample data and desired output added
dfMSR = pd.DataFrame({'A':['NA','na','O', '', 'N/A'],
'B':['Anna','E','NA', 'Z', 'Na']})
desired output:
A B
0 N/A Anna
1 N/A E
2 O N/A
3 Z
4 N/A N/A
Thanks
回答1:
You can use the case parameter of str.replace
since you have mentioned regex=True
dfMSR.apply(lambda x: x.astype(str).str.replace(r'\bna\b', 'N/A', regex=True,case=False))
Please note that it will not work if it is not Regex-based
Output:
import pandas as pd
dfMSR = pd.DataFrame({'A':['NA','na','O', '', 'N/A'],
'B':['Anna','E','NA', 'Z', 'Na']})
dfMSR
A B
0 NA Anna
1 na E
2 O NA
3 Z
4 N/A Na
dfMSR.apply(lambda x: x.astype(str).str.replace(r'\bna\b', 'N/A', regex=True,case=False))
A B
0 N/A Anna
1 N/A E
2 O N/A
3 Z
4 N/A N/A
回答2:
You can chain str.lower()
with .replace
and also, you needn't test the if
condition as it implicitly done in the replace:
dfMSR = dfMSR.apply(lambda x: x.str.lower()).replace(oldtxt, newtxt, regex=True)
In an example case it would look like this:
pd.DataFrame({'A':['NA','na','O'],
'B':['X','E','NA']}).apply(lambda x: x.str.lower()).replace('na','N/A',regex=True)
来源:https://stackoverflow.com/questions/65507704/python-better-way-to-handle-case-sensitivities-with-df-replace