Why is pandas.melt messing with my dtypes?

风格不统一 提交于 2021-02-05 08:44:12

问题


I have some pivot code that is failing with the error

pandas.core.base.DataError: No numeric types to aggregate

I have tracked down the problem to a previous call to pandas.melt

Here are the dtypes before the melt:

frame.dtypes
user_id                           Int64
feature                          object
seconds_since_start_assigned      Int32
total                           float32
programme_ids                    object
q1                                Int32
q2                                Int32
q3                                Int32
q4                                Int32
q5                                Int32
q6                                Int32
q7                                Int32
q8                                Int32
q9                                Int32
week                              Int32

Now for the melt

frame1 = pd.melt(
     frame,
     id_vars=['user_id', 'week'],
     value_vars=['q1', 'q2', 'q3', 'q4', 'q5', 'q6', 'q7', 'q8', 'q9'],
     var_name='question',
     value_name='score')
frame1.dtypes
user_id     object
week        object
question    object
score       object

Why has the call to melt replaced the Int32 I need for score with object?


回答1:


You are using the nullable Integer data type (capital 'I' in 'Int32'). This is still a fairly new data type and so not all of the functionality is there. Namely there's a big warning under the Construction section, and the issue is that Series cannot infer a nullable integer dtype, though perhaps someday:

In the future, we may provide an option for Series to infer a nullable-integer dtype.

We can see this ourselves. Series will not infer the correct type and are left with object as the only container that can hold the nullable Interger missing. Arrays work though.

import pandas as pd
arr = [1, pd._libs.missing.NAType(), 4]

pd.Series(arr)
#0       1
#1    <NA>
#2       4
#dtype: object   #  <- Did not infer the type :(

pd.array(arr)
#<IntegerArray>
#[1, <NA>, 4]
#Length: 3, dtype: Int64

So you melt, get a Series and pandas cannot infer the dtype so it gets cast to object after the melt. For now, you'll have to explicitly convert back to 'Int32'.



来源:https://stackoverflow.com/questions/63138258/why-is-pandas-melt-messing-with-my-dtypes

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!