Pandas Concat increases number of rows

问题

I'm concatenating two dataframes, so I want to one dataframe is located to another. But first I did some transformation to initial dataframe:

scaler = MinMaxScaler() 
real_data = pd.DataFrame(scaler.fit_transform(df[real_columns]), columns = real_columns)

And then concatenate:

categorial_data  = pd.get_dummies(df[categor_columns], prefix_sep= '__')
train = pd.concat([real_data, categorial_data], axis=1, ignore_index=True)

I dont know why, but number of rows increased:

print(df.shape, real_data.shape, categorial_data.shape, train.shape)
(1700645, 23) (1700645, 16) (1700645, 130) (1703915, 146)

What happened and how fix the problem?

As you can see number of columns for train equals to sum of columns real_data and categorial_data

回答1:

The problem is that sometimes when you perform several operations on a single dataframe object, the index persists in the memory. So using df.reset_index() will solve your problem.

回答2:

I solved the problem by using hstack

train = pd.DataFrame(np.hstack([real_data,categorial_data]))

来源：https://stackoverflow.com/questions/50368145/pandas-concat-increases-number-of-rows

标签

python

python-3.x

pandas

concat

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!