问题
I'm trying to remove several words in each value of a column but nothing is happening.
stop_words = ["and","lang","naman","the","sa","ko","na",
"yan","n","yang","mo","ung","ang","ako","ng",
"ndi","pag","ba","on","un","Me","at","to",
"is","sia","kaya","I","s","sla","dun","po","b","pro"
]
newdata['Verbatim'] = newdata['Verbatim'].replace(stop_words,'', inplace = True)
I'm trying to generate a word cloud out from the result of the replacement but I am getting the same words(that doesn't mean anything but has a lot of volumne)
回答1:
You can use words boundaries \b
with joined values by |
for regex OR
:
pat = '|'.join(r"\b{}\b".format(x) for x in stop_words)
newdata['Verbatim'] = newdata['Verbatim'].str.replace(pat, '')
Another solution is split
values, remove stopwords and join back with sapce in lambda function:
stop_words = set(stop_words)
f = lambda x: ' '.join(w for w in x.split() if not w in stop_words)
newdata['Verbatim'] = newdata['Verbatim'].apply(f)
Sample:
stop_words = ["and","lang","naman","the","sa","ko","na",
"yan","n","yang","mo","ung","ang","ako","ng",
"ndi","pag","ba","on","un","Me","at","to",
"is","sia","kaya","I","s","sla","dun","po","b","pro"
]
newdata = pd.DataFrame({'Verbatim':['I love my lang','the boss come to me']})
pat = '|'.join(r"\b{}\b".format(x) for x in stop_words)
newdata['Verbatim1'] = newdata['Verbatim'].str.replace(pat, '')
top_words = set(stop_words)
f = lambda x: ' '.join(w for w in x.split() if not w in stop_words)
newdata['Verbatim2'] = newdata['Verbatim'].apply(f)
print (newdata)
Verbatim Verbatim1 Verbatim2
0 I love my lang love my love my
1 the boss come to me boss come me boss come me
来源:https://stackoverflow.com/questions/55533962/removing-specific-word-in-a-string-in-pandas