I have a dataframe which looks like this:
A B C
1 red78 square big235
2 green circle small123
3 blue45 triangle big657
To remove all non-digit characters from strings in a Pandas column you should use str.replace with \D+
or [^0-9]+
patterns:
dfObject['C'] = dfObject['C'].str.replace(r'\D+', '')
Or, since in Python 3, \D
is fully Unicode-aware by default and thus does not match non-ASCII digits (like ۱۲۳۴۵۶۷۸۹
, see proof) you should consider
dfObject['C'] = dfObject['C'].str.replace(r'[^0-9]+', '')
So,
import re
print ( re.sub( r'\D+', '', '1۱۲۳۴۵۶۷۸۹0') ) # => 1۱۲۳۴۵۶۷۸۹0
print ( re.sub( r'[^0-9]+', '', '1۱۲۳۴۵۶۷۸۹0') ) # => 10