Pandas: Why is default column type for numeric float?

前端未结

关注

 3  1029

执笔经年 2021-01-02 10:11

I am using Pandas 0.18.1 with python 2.7.x. I have an empty dataframe that I read first. I see that the types of these columns are object which is OK. When I as

3条回答

太阳男子 (楼主)

2021-01-02 10:52

It's not possible for Pandas to store NaN values in integer columns.

This makes float the obvious default choice for data storage, because as soon as missing value arises Pandas would have to change the data type for the entire column. And missing values arise very often in practice.

As for why this is, it's a restriction inherited from Numpy. Basically, Pandas needs to set aside a particular bit pattern to represent NaN. This is straightforward for floating point numbers and it's defined in the IEEE 754 standard. It's more awkward and less efficient to do this for a fixed-width integer.

Update

Exciting news in pandas 0.24. IntegerArray is an experimental feature but might render my original answer obsolete. So if you're reading this on or after 27 Feb 2019, check out the docs for that feature.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...