Convert Pandas column containing NaNs to dtype `int`

后端未结

关注

 17  2359

终归单人心 2020-11-22 11:18

I read data from a .csv file to a Pandas dataframe as below. For one of the columns, namely id, I want to specify the column type as int. The probl

17条回答

春和景丽 (楼主)

2020-11-22 11:55
You could use .dropna() if it is OK to drop the rows with the NaN values.
```
df = df.dropna(subset=['id'])
```
Alternatively, use .fillna() and .astype() to replace the NaN with values and convert them to int.

I ran into this problem when processing a CSV file with large integers, while some of them were missing (NaN). Using float as the type was not an option, because I might loose the precision.

My solution was to use str as the intermediate type. Then you can convert the string to int as you please later in the code. I replaced NaN with 0, but you could choose any value.
```
df = pd.read_csv(filename, dtype={'id':str})
df["id"] = df["id"].fillna("0").astype(int)
```
For the illustration, here is an example how floats may loose the precision:
```
s = "12345678901234567890"
f = float(s)
i = int(f)
i2 = int(s)
print (f, i, i2)
```
And the output is:
```
1.2345678901234567e+19 12345678901234567168 12345678901234567890
```
0 讨论(0)

查看其它17个回答
发布评论:

提交评论
- 加载中...