There are several countries with numbers and/or parenthesis in my list. How I remove these?
e.g.
\'Bolivia (Plurinational State of)\' should be \'Bolivi
Use Series.str.replace with regex for replacement, \s*
is for possible spaces before (
, then \(.*\)
is for values ()
and values between |
is for regex or
and \d+
is for numbers with 1 or more digits:
df = pd.DataFrame({'a':['Bolivia (Plurinational State of)','Switzerland17']})
df['a'] = df['a'].str.replace('(\s*\(.*\)|\d+)','')
print (df)
a
0 Bolivia
1 Switzerland