问题
Given a dataframe as follows, I need to check room column:
id room
0 1 A-102
1 2 201
2 3 B309
3 4 C·102
4 5 E_1089
The correct format of this column should be numbers, alphabet or hyphen, otherwise, fill check column with incorrect
The expected result is like this:
id room check
0 1 A-102 NaN
1 2 201 NaN
2 3 B309 NaN
3 4 C·102 incorrect
4 5 E_1089 incorrect
Here informal syntax can be:
df.loc[<filter1> | (<filter2>) | (<filter3>), 'check'] = 'incorrect'
Thanks for your help at advance.
回答1:
Use str.match to force all characters:
df['check'] = np.where(df.room.str.match('^[a-zA-Z\d\-]*$'), np.NaN, 'incorrect')
Or str.contains with negation pattern:
df['check'] = np.where(df.room.str.contains('([^a-zA-Z\d\-])'), 'incorrect', np.NaN)
Output:
id room check
0 1 A-102 nan
1 2 201 nan
2 3 B309 nan
3 4 C·102 incorrect
4 5 E_1089 incorrect
If you want to update the existing check column, use loc access. For example:
df.loc[df.room.str.contains('([^a-zA-Z\d\-])'), 'check'] = 'incorrect'
# or safer when `NaN` presents
# df.loc[df.room.str.contains('([^a-zA-Z\d\-])') == True, 'check'] = 'incorrect'
来源:https://stackoverflow.com/questions/64674083/filter-rows-of-one-column-which-is-alphabet-numbers-or-hyphen-in-pandas