How to change index dtype of pandas DataFrame to int32?

后端 未结 3 663
甜味超标
甜味超标 2020-12-11 03:47

A default dtype of DataFrame index is int64 and I would like to change it to int32.

I tried changing it with pd.DataFrame.set_index and Num

相关标签:
3条回答
  • 2020-12-11 04:11

    All of the code paths I could find, coerce the dtype:

    Check in pandas.Index.__new__()

    if issubclass(data.dtype.type, np.integer):
        from .numeric import Int64Index
        return Int64Index(data, copy=copy, dtype=dtype, name=name)
    

    This allows passing a dtype, but in NumericIndex().__new__() we have:

    if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
        subarr = np.array(data, dtype=cls._default_dtype, copy=copy)
    

    Which changes the dtype.

    0 讨论(0)
  • 2020-12-11 04:23

    Can someone show a working code to produce pandas index with int32 size?

    @PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.

    Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range with NumPy np.arange. As described in the pd.RangeIndex docs:

    RangeIndex is a memory-saving special case of Int64Index limited to representing monotonic ranges. Using RangeIndex may in some instances improve computing speed.

    0 讨论(0)
  • 2020-12-11 04:29

    Not sure this is something worth doing in practice, but the following should work:

    class Int32Index(pd.Int64Index):
        _default_dtype = np.int32
    
        @property
        def asi8(self):
            return self.values
    
    i = Int32Index(np.array([...], dtype='int32'))
    

    (from here)

    0 讨论(0)
提交回复
热议问题