Filter rows of one column which is alphabet, numbers or hyphen in Pandas

荒凉一梦 提交于 2021-01-29 00:12:58

问题


Given a dataframe as follows, I need to check room column:

   id    room
0   1   A-102
1   2     201
2   3    B309
3   4   C·102
4   5  E_1089

The correct format of this column should be numbers, alphabet or hyphen, otherwise, fill check column with incorrect

The expected result is like this:

   id    room      check
0   1   A-102        NaN
1   2     201        NaN
2   3    B309        NaN
3   4   C·102  incorrect
4   5  E_1089  incorrect

Here informal syntax can be:

df.loc[<filter1> | (<filter2>) | (<filter3>), 'check'] = 'incorrect'

Thanks for your help at advance.


回答1:


Use str.match to force all characters:

df['check'] = np.where(df.room.str.match('^[a-zA-Z\d\-]*$'), np.NaN, 'incorrect')

Or str.contains with negation pattern:

df['check'] = np.where(df.room.str.contains('([^a-zA-Z\d\-])'), 'incorrect', np.NaN)

Output:

   id    room      check
0   1   A-102        nan
1   2     201        nan
2   3    B309        nan
3   4   C·102  incorrect
4   5  E_1089  incorrect

If you want to update the existing check column, use loc access. For example:

df.loc[df.room.str.contains('([^a-zA-Z\d\-])'), 'check'] = 'incorrect'
# or safer when `NaN` presents
# df.loc[df.room.str.contains('([^a-zA-Z\d\-])') == True, 'check'] = 'incorrect'


来源:https://stackoverflow.com/questions/64674083/filter-rows-of-one-column-which-is-alphabet-numbers-or-hyphen-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!