How to change index dtype of pandas DataFrame to int32?

后端未结

关注

 3  665

甜味超标

A default dtype of DataFrame index is int64 and I would like to change it to int32.

I tried changing it with pd.DataFrame.set_index and Num

相关标签:

3条回答

一个人的身影

2020-12-11 04:11

All of the code paths I could find, coerce the dtype:

Check in pandas.Index.__new__()

if issubclass(data.dtype.type, np.integer):
    from .numeric import Int64Index
    return Int64Index(data, copy=copy, dtype=dtype, name=name)

This allows passing a dtype, but in NumericIndex().__new__() we have:

if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
    subarr = np.array(data, dtype=cls._default_dtype, copy=copy)

Which changes the dtype.

0 讨论(0)

花落未央

2020-12-11 04:23

Can someone show a working code to produce pandas index with int32 size?

@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.

Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range with NumPy np.arange. As described in the pd.RangeIndex docs:

RangeIndex is a memory-saving special case of Int64Index limited to representing monotonic ranges. Using RangeIndex may in some instances improve computing speed.

0 讨论(0)
发布评论:

提交评论
- 加载中...

不知归路

2020-12-11 04:29

Not sure this is something worth doing in practice, but the following should work:

class Int32Index(pd.Int64Index):
    _default_dtype = np.int32

    @property
    def asi8(self):
        return self.values

i = Int32Index(np.array([...], dtype='int32'))

(from here)

0 讨论(0)