问题
I have some pivot code that is failing with the error
pandas.core.base.DataError: No numeric types to aggregate
I have tracked down the problem to a previous call to pandas.melt
Here are the dtypes before the melt:
frame.dtypes
user_id Int64
feature object
seconds_since_start_assigned Int32
total float32
programme_ids object
q1 Int32
q2 Int32
q3 Int32
q4 Int32
q5 Int32
q6 Int32
q7 Int32
q8 Int32
q9 Int32
week Int32
Now for the melt
frame1 = pd.melt(
frame,
id_vars=['user_id', 'week'],
value_vars=['q1', 'q2', 'q3', 'q4', 'q5', 'q6', 'q7', 'q8', 'q9'],
var_name='question',
value_name='score')
frame1.dtypes
user_id object
week object
question object
score object
Why has the call to melt replaced the Int32 I need for score with object?
回答1:
You are using the nullable Integer data type (capital 'I' in 'Int32'). This is still a fairly new data type and so not all of the functionality is there. Namely there's a big warning under the Construction section, and the issue is that Series cannot infer a nullable integer dtype, though perhaps someday:
In the future, we may provide an option for Series to infer a nullable-integer dtype.
We can see this ourselves. Series will not infer the correct type and are left with object as the only container that can hold the nullable Interger missing. Arrays work though.
import pandas as pd
arr = [1, pd._libs.missing.NAType(), 4]
pd.Series(arr)
#0 1
#1 <NA>
#2 4
#dtype: object # <- Did not infer the type :(
pd.array(arr)
#<IntegerArray>
#[1, <NA>, 4]
#Length: 3, dtype: Int64
So you melt, get a Series and pandas cannot infer the dtype so it gets cast to object after the melt. For now, you'll have to explicitly convert back to 'Int32'.
来源:https://stackoverflow.com/questions/63138258/why-is-pandas-melt-messing-with-my-dtypes