Pandas Concat increases number of rows

落爺英雄遲暮 提交于 2019-12-23 00:32:04

问题


I'm concatenating two dataframes, so I want to one dataframe is located to another. But first I did some transformation to initial dataframe:

scaler = MinMaxScaler() 
real_data = pd.DataFrame(scaler.fit_transform(df[real_columns]), columns = real_columns)

And then concatenate:

categorial_data  = pd.get_dummies(df[categor_columns], prefix_sep= '__')
train = pd.concat([real_data, categorial_data], axis=1, ignore_index=True)

I dont know why, but number of rows increased:

print(df.shape, real_data.shape, categorial_data.shape, train.shape)
(1700645, 23) (1700645, 16) (1700645, 130) (1703915, 146)

What happened and how fix the problem?

As you can see number of columns for train equals to sum of columns real_data and categorial_data


回答1:


The problem is that sometimes when you perform several operations on a single dataframe object, the index persists in the memory. So using df.reset_index() will solve your problem.




回答2:


I solved the problem by using hstack

train = pd.DataFrame(np.hstack([real_data,categorial_data]))


来源:https://stackoverflow.com/questions/50368145/pandas-concat-increases-number-of-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!