.loc indexing changes type

谁都会走 提交于 2021-01-27 17:01:27

问题


If I have a pandas.DataFrame with columns of different type (e.g. int64 and float64), getting a single element from the int column with .loc indexing converts the output to float:

import pandas as pd
df_test = pd.DataFrame({'ints':[1,2,3], 'floats': [4.5,5.5,6.5]})

df_test['ints'].dtype
>>> dtype('int64')

df_test.loc[0,'ints']
>>> 1.0

type(df_test.loc[0,'ints'])
>>> numpy.float64

If I use .at for indexing, it doesn't happen:

type(df_test.at[0,'ints'])
>>> numpy.int64

It also doesn't happen when all the columns are int:

df_test = pd.DataFrame({'ints':[1,2,3], 'ints2': [4,5,6]})
df_test.loc[0,'ints']
>>> 1

Is this a consequence of some core properties of pandas indexing? In other words, is it a bug of a feature? :)

Update: Turns out, it is a bug and it is going to be fixed in pandas 0.20.0.


回答1:


The issue here is that loc is implicitly trying to return a Series initially even though you're returning a single column and hence a scalar value from that row the dtype is being upcasted to a dtype that will support all dtypes for that row, if you selected just that column and use loc then it wouldn't convert this:

In [83]:
df_test['ints'].loc[0]

Out[83]:
1

You can see what happens when you don't sub-select:

In [84]:
df_test.loc[0]

Out[84]:
floats    4.5
ints      1.0
Name: 0, dtype: float64

This maybe undesirable and I think there maybe a github issue regarding this

this issue is kinda related



来源:https://stackoverflow.com/questions/43366763/loc-indexing-changes-type

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!