A default dtype of DataFrame index is int64
and I would like to change it to int32
.
I tried changing it with pd.DataFrame.set_index and Num
All of the code paths I could find, coerce the dtype:
Check in pandas.Index.__new__()
if issubclass(data.dtype.type, np.integer):
from .numeric import Int64Index
return Int64Index(data, copy=copy, dtype=dtype, name=name)
This allows passing a dtype, but in NumericIndex().__new__()
we have:
if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
subarr = np.array(data, dtype=cls._default_dtype, copy=copy)
Which changes the dtype.
Can someone show a working code to produce pandas index with int32 size?
@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex
with an Int64
/ Int32
index.
Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range
with NumPy np.arange
. As described in the pd.RangeIndex docs:
RangeIndex
is a memory-saving special case ofInt64Index
limited to representing monotonic ranges. UsingRangeIndex
may in some instances improve computing speed.
Not sure this is something worth doing in practice, but the following should work:
class Int32Index(pd.Int64Index):
_default_dtype = np.int32
@property
def asi8(self):
return self.values
i = Int32Index(np.array([...], dtype='int32'))
(from here)