Return the unmatched rows from the regex pattern

本小妞迷上赌 提交于 2019-12-06 08:26:52

You can just do a negation of your existing Boolean series:

df[~df.Sequence.str.contains(pat)]

This will give you the desired output:

   Sequence  Rating
1  YGEIFEKF       2
3  YLESFYKF       4
5  WPDVIHSF       6

Brief explanation:

df.Sequence.str.contains(pat)

will return a Boolean series:

0     True
1    False
2     True
3    False
4     True
5    False
Name: Sequence, dtype: bool

Negating it using ~ yields

~df.Sequence.str.contains(pat)

0    False
1     True
2    False
3     True
4    False
5     True
Name: Sequence, dtype: bool

which is another Boolean series you can pass to your original dataframe.

You can use ~ for not:

pat = r'\b.[YF]\w+[LFI]\b'
new_df[~new_df.Sequence.str.contains(pat)]

#   Sequence    Rating
#1  YGEIFEKF    2
#3  YLESFYKF    4
#5  WPDVIHSF    6

Psidom's answer is more elegant, but another way to solve this problem is to modify the regex pattern to use a negative lookahead assertion, and then use match() instead of contains():

pat = r'\b.[YF]\w+[LFI]\b'
not_pat = r'(?!{})'.format(pat)

>>> new_df[new_df.Sequence.str.match(pat)]
   Sequence  Rating
0  HYHIVQKF       1
2  TYGGSWKF       3
4  YYNTAVKL       5

>>> new_df[new_df.Sequence.str.match(not_pat)]
   Sequence  Rating
1  YGEIFEKF       2
3  YLESFYKF       4
5  WPDVIHSF       6
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!