pandas: to_numeric for multiple columns

后端 未结 5 1793
刺人心
刺人心 2020-11-27 03:47

I\'m working with the following df:

c.sort_values(\'2005\', ascending=False).head(3)
      GeoName ComponentName     IndustryId IndustryClassification Descri         


        
5条回答
  •  攒了一身酷
    2020-11-27 04:30

    UPDATE: you don't need to convert your values afterwards, you can do it on-the-fly when reading your CSV:

    In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0)
    
    In [166]: df.dtypes
    Out[166]:
    GeoName                    object
    ComponentName              object
    IndustryId                  int64
    IndustryClassification     object
    Description                object
    2004                        int64
    2005                        int64
    2006                        int64
    2007                        int64
    2008                        int64
    2009                        int64
    2010                        int64
    2011                        int64
    2012                        int64
    2013                        int64
    2014                      float64
    dtype: object
    

    If you need to convert multiple columns to numeric dtypes - use the following technique:

    Sample source DF:

    In [271]: df
    Out[271]:
         id    a  b  c  d  e    f
    0  id_3  AAA  6  3  5  8    1
    1  id_9    3  7  5  7  3  BBB
    2  id_7    4  2  3  5  4    2
    3  id_0    7  3  5  7  9    4
    4  id_0    2  4  6  4  0    2
    
    In [272]: df.dtypes
    Out[272]:
    id    object
    a     object
    b      int64
    c      int64
    d      int64
    e      int64
    f     object
    dtype: object
    

    Converting selected columns to numeric dtypes:

    In [273]: cols = df.columns.drop('id')
    
    In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
    
    In [275]: df
    Out[275]:
         id    a  b  c  d  e    f
    0  id_3  NaN  6  3  5  8  1.0
    1  id_9  3.0  7  5  7  3  NaN
    2  id_7  4.0  2  3  5  4  2.0
    3  id_0  7.0  3  5  7  9  4.0
    4  id_0  2.0  4  6  4  0  2.0
    
    In [276]: df.dtypes
    Out[276]:
    id     object
    a     float64
    b       int64
    c       int64
    d       int64
    e       int64
    f     float64
    dtype: object
    

    PS if you want to select all string (object) columns use the following simple trick:

    cols = df.columns[df.dtypes.eq('object')]
    

提交回复
热议问题