NumPy or Pandas: Keeping array type as integer while having a NaN value

前端未结

关注

 8  1098

粉色の甜心 2020-11-22 06:05

Is there a preferred way to keep the data type of a numpy array fixed as int (or int64 or whatever), while still having an element ins

8条回答

不知归路 (楼主)

2020-11-22 06:26
Pandas v0.24+

Functionality to support NaN in integer series will be available in v0.24 upwards. There's information on this in the v0.24 "What's New" section, and more details under Nullable Integer Data Type.

Pandas v0.23 and earlier

In general, it's best to work with float series where possible, even when the series is upcast from int to float due to inclusion of NaN values. This enables vectorised NumPy-based calculations where, otherwise, Python-level loops would be processed.

The docs do suggest : "One possibility is to use dtype=object arrays instead." For example:
```
s = pd.Series([1, 2, 3, np.nan])

print(s.astype(object))

0      1
1      2
2      3
3    NaN
dtype: object
```
For cosmetic reasons, e.g. output to a file, this may be preferable.

Pandas v0.23 and earlier: background

NaN is considered a float. The docs currently (as of v0.23) specify the reason why integer series are upcasted to float:

In the absence of high performance NA support being built into NumPy from the ground up, the primary casualty is the ability to represent NAs in integer arrays.

This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”.

The docs also provide rules for upcasting due to NaN inclusion:
```
Typeclass   Promotion dtype for storing NAs
floating    no change
object      no change
integer     cast to float64
boolean     cast to object
```
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

NumPy or Pandas: Keeping array type as integer while having a NaN value

Pandas v0.24+

Pandas v0.23 and earlier

Pandas v0.23 and earlier: background