How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

后端 未结 5 1855
清酒与你
清酒与你 2020-11-30 05:43

I have a dataframe which looks like this:

     A       B           C
1   red78   square    big235
2   green   circle    small123
3   blue45  triangle  big657         


        
5条回答
  •  醉梦人生
    2020-11-30 06:30

    To remove all non-digit characters from strings in a Pandas column you should use str.replace with \D+ or [^0-9]+ patterns:

    dfObject['C'] = dfObject['C'].str.replace(r'\D+', '')
    

    Or, since in Python 3, \D is fully Unicode-aware by default and thus does not match non-ASCII digits (like ۱۲۳۴۵۶۷۸۹, see proof) you should consider

    dfObject['C'] = dfObject['C'].str.replace(r'[^0-9]+', '')
    

    So,

    import re
    print ( re.sub( r'\D+', '', '1۱۲۳۴۵۶۷۸۹0') )         # => 1۱۲۳۴۵۶۷۸۹0
    print ( re.sub( r'[^0-9]+', '', '1۱۲۳۴۵۶۷۸۹0') )     # => 10
    

提交回复
热议问题