Convert float64 column to int64 in Pandas

前端 未结 4 1016
栀梦
栀梦 2020-12-23 21:14

I tried to convert a column from data type float64 to int64 using:

df[\'column name\'].astype(int64)

but got an e

4条回答
  •  [愿得一人]
    2020-12-23 21:55

    Solution for pandas 0.24+ for converting numeric with missing values:

    df = pd.DataFrame({'column name':[7500000.0,7500000.0, np.nan]})
    print (df['column name'])
    0    7500000.0
    1    7500000.0
    2          NaN
    Name: column name, dtype: float64
    
    df['column name'] = df['column name'].astype(np.int64)
    

    ValueError: Cannot convert non-finite values (NA or inf) to integer

    #http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
    df['column name'] = df['column name'].astype('Int64')
    print (df['column name'])
    0    7500000
    1    7500000
    2        NaN
    Name: column name, dtype: Int64
    

    I think you need cast to numpy.int64:

    df['column name'].astype(np.int64)
    

    Sample:

    df = pd.DataFrame({'column name':[7500000.0,7500000.0]})
    print (df['column name'])
    0    7500000.0
    1    7500000.0
    Name: column name, dtype: float64
    
    df['column name'] = df['column name'].astype(np.int64)
    #same as
    #df['column name'] = df['column name'].astype(pd.np.int64)
    print (df['column name'])
    0    7500000
    1    7500000
    Name: column name, dtype: int64
    

    If some NaNs in columns need replace them to some int (e.g. 0) by fillna, because type of NaN is float:

    df = pd.DataFrame({'column name':[7500000.0,np.nan]})
    
    df['column name'] = df['column name'].fillna(0).astype(np.int64)
    print (df['column name'])
    0    7500000
    1          0
    Name: column name, dtype: int64
    

    Also check documentation - missing data casting rules

    EDIT:

    Convert values with NaNs is buggy:

    df = pd.DataFrame({'column name':[7500000.0,np.nan]})
    
    df['column name'] = df['column name'].values.astype(np.int64)
    print (df['column name'])
    0                7500000
    1   -9223372036854775808
    Name: column name, dtype: int64
    

提交回复
热议问题