问题
Im trying to drop rows in pandas that contains "???", it works for every other value except for "???", I do not know whats the problem.
This is my code (I have tried both types):
df = df[~df["text"].str.contains("?????", na=False)]
df = df[~df["text"].str.contains("?????")]
error that I'm getting:
re.error: nothing to repeat at position 0
It works for every other value except for "????". I have googled it, and looked all over this website but I couldnt find any solutions.
回答1:
The parameter expects a regular expression, hence the error re.error
.
You can either escape the ? inside the expression like this:
df = df[~df["text"].str.contains("\?\?\?\?\?")]
Or set regex=False
as Vorsprung sugested:
df = df[~df["text"].str.contains("?????",regex=False)]
回答2:
let's convert this into running code:
import numpy as np
import pandas as pd
data = {'A': ['abc', 'cxx???xx', '???',], 'B': ['add', 'ddb', 'c', ]}
df = pd.DataFrame.from_dict(data)
df
output:
A B
0 abc add
1 cxx???xx ddb
2 ??? c
with this:
df[df['A'].str.contains('???',regex=False)]
output:
A B
1 cxx???xx ddb
2 ??? c
you need to tell contains()
, that your search string is not a regex.
来源:https://stackoverflow.com/questions/61034172/drop-rows-in-pandas-if-they-contains