Scaling test data to 0 and 1 using MinMaxScaler

こ雲淡風輕ζ 提交于 2019-12-13 06:38:53

问题


Using the MinMaxScaler from sklearn, I scale my data as below.

min_max_scaler = preprocessing.MinMaxScaler()
X_train_scaled = min_max_scaler.fit_transform(features_train)
X_test_scaled = min_max_scaler.transform(features_test)

However, when printing X_test_scaled.min(), I have some negative values (the values do not fall between 0 and 1). This is due to the fact that the lowest value in my test data was lower than the train data, of which the min max scaler was fit.

How much effect does not having exactly normalized data between 0 and 1 values have on the SVM classifier? Also, is it bad practice to concatenate the train and test data into a single matrix, perform min-max scaling to ensure values are between 0 and 1, then seperate them again?


回答1:


If you can scale all your data in one shot this would be better because all your data are managed by the Scaler in a logical way (all between 0 and 1). But for the SVM algorithm, there must be no difference as the scaler will extend the scale. There's still the same difference even if it is negative.

In the documentation we can see that there are negative values so I don't think it has an impact on the result




回答2:


For this scaling it probably doesn't matter much in practice, but in general you should not use your test data to estimate any parameters of the preprocessing. This can severely bias you results for more complex preprocessing steps.

There is really no reason why you would want to concatenate the data here, the SVM will deal with it. If you would be using a model that needs positive values and your test data is not made positive, you might consider another strategy than the MinMaxScaler.



来源:https://stackoverflow.com/questions/30473602/scaling-test-data-to-0-and-1-using-minmaxscaler

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!