Convert float64 column to int64 in Pandas

前端 未结 4 1015
栀梦
栀梦 2020-12-23 21:14

I tried to convert a column from data type float64 to int64 using:

df[\'column name\'].astype(int64)

but got an e

相关标签:
4条回答
  • 2020-12-23 21:55

    Solution for pandas 0.24+ for converting numeric with missing values:

    df = pd.DataFrame({'column name':[7500000.0,7500000.0, np.nan]})
    print (df['column name'])
    0    7500000.0
    1    7500000.0
    2          NaN
    Name: column name, dtype: float64
    
    df['column name'] = df['column name'].astype(np.int64)
    

    ValueError: Cannot convert non-finite values (NA or inf) to integer

    #http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
    df['column name'] = df['column name'].astype('Int64')
    print (df['column name'])
    0    7500000
    1    7500000
    2        NaN
    Name: column name, dtype: Int64
    

    I think you need cast to numpy.int64:

    df['column name'].astype(np.int64)
    

    Sample:

    df = pd.DataFrame({'column name':[7500000.0,7500000.0]})
    print (df['column name'])
    0    7500000.0
    1    7500000.0
    Name: column name, dtype: float64
    
    df['column name'] = df['column name'].astype(np.int64)
    #same as
    #df['column name'] = df['column name'].astype(pd.np.int64)
    print (df['column name'])
    0    7500000
    1    7500000
    Name: column name, dtype: int64
    

    If some NaNs in columns need replace them to some int (e.g. 0) by fillna, because type of NaN is float:

    df = pd.DataFrame({'column name':[7500000.0,np.nan]})
    
    df['column name'] = df['column name'].fillna(0).astype(np.int64)
    print (df['column name'])
    0    7500000
    1          0
    Name: column name, dtype: int64
    

    Also check documentation - missing data casting rules

    EDIT:

    Convert values with NaNs is buggy:

    df = pd.DataFrame({'column name':[7500000.0,np.nan]})
    
    df['column name'] = df['column name'].values.astype(np.int64)
    print (df['column name'])
    0                7500000
    1   -9223372036854775808
    Name: column name, dtype: int64
    
    0 讨论(0)
  • 2020-12-23 22:08

    This seems to be a little buggy in Pandas 0.23.4?

    If there are np.nan values then this will throw an error as expected:

    df['col'] = df['col'].astype(np.int64)
    

    But doesn't change any values from float to int as I would expect if "ignore" is used:

    df['col'] = df['col'].astype(np.int64,errors='ignore') 
    

    It worked if I first converted np.nan:

    df['col'] = df['col'].fillna(0).astype(np.int64)
    df['col'] = df['col'].astype(np.int64)
    

    Now I can't figure out how to get null values back in place of the zeroes since this will convert everything back to float again:

    df['col']  = df['col'].replace(0,np.nan)
    
    0 讨论(0)
  • 2020-12-23 22:09

    consider using

    df['column name'].astype('Int64')

    nan will be changed to NaN

    0 讨论(0)
  • 2020-12-23 22:10

    You can need to pass in the string 'int64':

    >>> import pandas as pd
    >>> df = pd.DataFrame({'a': [1.0, 2.0]})  # some test dataframe
    
    >>> df['a'].astype('int64')
    0    1
    1    2
    Name: a, dtype: int64
    

    There are some alternative ways to specify 64-bit integers:

    >>> df['a'].astype('i8')      # integer with 8 bytes (64 bit)
    0    1
    1    2
    Name: a, dtype: int64
    
    >>> import numpy as np
    >>> df['a'].astype(np.int64)  # native numpy 64 bit integer
    0    1
    1    2
    Name: a, dtype: int64
    

    Or use np.int64 directly on your column (but it returns a numpy.array):

    >>> np.int64(df['a'])
    array([1, 2], dtype=int64)
    
    0 讨论(0)
提交回复
热议问题