Is there a preferred way to keep the data type of a numpy
array fixed as int
(or int64
or whatever), while still having an element ins
Functionality to support NaN
in integer series will be available in v0.24 upwards. There's information on this in the v0.24 "What's New" section, and more details under Nullable Integer Data Type.
In general, it's best to work with float
series where possible, even when the series is upcast from int
to float
due to inclusion of NaN
values. This enables vectorised NumPy-based calculations where, otherwise, Python-level loops would be processed.
The docs do suggest : "One possibility is to use dtype=object
arrays instead." For example:
s = pd.Series([1, 2, 3, np.nan])
print(s.astype(object))
0 1
1 2
2 3
3 NaN
dtype: object
For cosmetic reasons, e.g. output to a file, this may be preferable.
NaN is considered a float. The docs currently (as of v0.23) specify the reason why integer series are upcasted to float
:
In the absence of high performance NA support being built into NumPy from the ground up, the primary casualty is the ability to represent NAs in integer arrays.
This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”.
The docs also provide rules for upcasting due to NaN
inclusion:
Typeclass Promotion dtype for storing NAs
floating no change
object no change
integer cast to float64
boolean cast to object