A default dtype of DataFrame index is int64 and I would like to change it to int32.
I tried changing it with pd.DataFrame.set_index and Num
All of the code paths I could find, coerce the dtype:
Check in pandas.Index.__new__()
if issubclass(data.dtype.type, np.integer):
from .numeric import Int64Index
return Int64Index(data, copy=copy, dtype=dtype, name=name)
This allows passing a dtype, but in NumericIndex().__new__() we have:
if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
subarr = np.array(data, dtype=cls._default_dtype, copy=copy)
Which changes the dtype.
Can someone show a working code to produce pandas index with int32 size?
@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.
Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range with NumPy np.arange. As described in the pd.RangeIndex docs:
RangeIndexis a memory-saving special case ofInt64Indexlimited to representing monotonic ranges. UsingRangeIndexmay in some instances improve computing speed.
Not sure this is something worth doing in practice, but the following should work:
class Int32Index(pd.Int64Index):
_default_dtype = np.int32
@property
def asi8(self):
return self.values
i = Int32Index(np.array([...], dtype='int32'))
(from here)