Convert Pandas column containing NaNs to dtype `int`

后端 未结 17 2359
终归单人心
终归单人心 2020-11-22 11:18

I read data from a .csv file to a Pandas dataframe as below. For one of the columns, namely id, I want to specify the column type as int. The probl

17条回答
  •  春和景丽
    2020-11-22 11:55

    You could use .dropna() if it is OK to drop the rows with the NaN values.

    df = df.dropna(subset=['id'])
    

    Alternatively, use .fillna() and .astype() to replace the NaN with values and convert them to int.

    I ran into this problem when processing a CSV file with large integers, while some of them were missing (NaN). Using float as the type was not an option, because I might loose the precision.

    My solution was to use str as the intermediate type. Then you can convert the string to int as you please later in the code. I replaced NaN with 0, but you could choose any value.

    df = pd.read_csv(filename, dtype={'id':str})
    df["id"] = df["id"].fillna("0").astype(int)
    

    For the illustration, here is an example how floats may loose the precision:

    s = "12345678901234567890"
    f = float(s)
    i = int(f)
    i2 = int(s)
    print (f, i, i2)
    

    And the output is:

    1.2345678901234567e+19 12345678901234567168 12345678901234567890
    

提交回复
热议问题