convert entire pandas dataframe to integers in pandas (0.17.0)

后端 未结 3 919
长发绾君心
长发绾君心 2020-11-30 05:21

My question is very similar to this one, but I need to convert my entire dataframe instead of just a series. The to_numeric function only works on one series at

相关标签:
3条回答
  • 2020-11-30 06:11

    All columns convertible

    You can apply the function to all columns:

    df.apply(pd.to_numeric)
    

    Example:

    >>> df = pd.DataFrame({'a': ['1', '2'], 
                           'b': ['45.8', '73.9'],
                           'c': [10.5, 3.7]})
    
    >>> df.info()
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 2 entries, 0 to 1
    Data columns (total 3 columns):
    a    2 non-null object
    b    2 non-null object
    c    2 non-null float64
    dtypes: float64(1), object(2)
    memory usage: 64.0+ bytes
    
    >>> df.apply(pd.to_numeric).info()
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 2 entries, 0 to 1
    Data columns (total 3 columns):
    a    2 non-null int64
    b    2 non-null float64
    c    2 non-null float64
    dtypes: float64(2), int64(1)
    memory usage: 64.0 bytes
    

    Not all columns convertible

    pd.to_numeric has the keyword argument errors:

      Signature: pd.to_numeric(arg, errors='raise')
      Docstring:
      Convert argument to a numeric type.
    
    Parameters
    ----------
    arg : list, tuple or array of objects, or Series
    errors : {'ignore', 'raise', 'coerce'}, default 'raise'
        - If 'raise', then invalid parsing will raise an exception
        - If 'coerce', then invalid parsing will be set as NaN
        - If 'ignore', then invalid parsing will return the input
    

    Setting it to ignore will return the column unchanged if it cannot be converted into a numeric type.

    As pointed out by Anton Protopopov, the most elegant way is to supply ignore as keyword argument to apply():

    >>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
    >>> df.apply(pd.to_numeric, errors='ignore').info()
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 2 entries, 0 to 1
    Data columns (total 2 columns):
    Words    2 non-null object
    ints     2 non-null int64
    dtypes: int64(1), object(1)
    memory usage: 48.0+ bytes
    

    My previously suggested way, using partial from the module functools, is more verbose:

    >>> from functools import partial
    >>> df = pd.DataFrame({'ints': ['3', '5'], 
                           'Words': ['Kobe', 'Bryant']})
    >>> df.apply(partial(pd.to_numeric, errors='ignore')).info()
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 2 entries, 0 to 1
    Data columns (total 2 columns):
    Words    2 non-null object
    ints     2 non-null int64
    dtypes: int64(1), object(1)
    memory usage: 48.0+ bytes
    
    0 讨论(0)
  • 2020-11-30 06:11

    you can use df.astype() to convert the series to desired datatype.

    For example: my_str_df = [['20','30','40']]

    then: my_int_df = my_str_df['column_name'].astype(int) # this will be the int type

    0 讨论(0)
  • 2020-11-30 06:11

    apply() the pd.to_numeric with errors='ignore' and assign it back to the DataFrame:

    df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
    print ("Orig: \n",df.dtypes)
    
    df.apply(pd.to_numeric, errors='ignore')
    print ("\nto_numeric: \n",df.dtypes)
    
    df = df.apply(pd.to_numeric, errors='ignore')
    print ("\nto_numeric with assign: \n",df.dtypes)
    

    Output:

    Orig: 
     ints     object
    Words    object
    dtype: object
    
    to_numeric: 
     ints     object
    Words    object
    dtype: object
    
    to_numeric with assign: 
     ints      int64
    Words    object
    dtype: object
    
    0 讨论(0)
提交回复
热议问题