scikit-learn

Receiving KeyError: “None of [Int64Index([ … dtype='int64', length=1323)] are in the [columns]”

≯℡__Kan透↙ 提交于 2020-12-29 05:55:31
问题 SUMMARY When feeding test and train data into a ROC curve plot, I receive the following error: KeyError: "None of [Int64Index([ 0, 1, 2, ... dtype='int64', length=1323)] are in the [columns]" The error seems to be saying that it doesn't like the format of my data, but it worked when run the first time and I haven't been able to get it to run again. Am I incorrectly splitting my data or sending incorrectly formatted data into my function? WHAT I'VE TRIED Read through several StackOverflow

Working of labelEncoder in sklearn

不羁岁月 提交于 2020-12-29 04:03:32
问题 Say I have the following input feature: hotel_id = [1, 2, 3, 2, 3] This is a categorical feature with numeric values. If I give it to the model as it is, the model will treat it as continuous variable, ie., 2 > 1. If I apply sklearn.labelEncoder() then I will get: hotel_id = [0, 1, 2, 1, 2] So this encoded feature is considered as continuous or categorical? If it is treated as continuous then whats the use of labelEncoder(). P.S. I know about one hot encoding. But there are around 100 hotel

Working of labelEncoder in sklearn

北城余情 提交于 2020-12-29 04:03:03
问题 Say I have the following input feature: hotel_id = [1, 2, 3, 2, 3] This is a categorical feature with numeric values. If I give it to the model as it is, the model will treat it as continuous variable, ie., 2 > 1. If I apply sklearn.labelEncoder() then I will get: hotel_id = [0, 1, 2, 1, 2] So this encoded feature is considered as continuous or categorical? If it is treated as continuous then whats the use of labelEncoder(). P.S. I know about one hot encoding. But there are around 100 hotel

Working of labelEncoder in sklearn

独自空忆成欢 提交于 2020-12-29 04:02:48
问题 Say I have the following input feature: hotel_id = [1, 2, 3, 2, 3] This is a categorical feature with numeric values. If I give it to the model as it is, the model will treat it as continuous variable, ie., 2 > 1. If I apply sklearn.labelEncoder() then I will get: hotel_id = [0, 1, 2, 1, 2] So this encoded feature is considered as continuous or categorical? If it is treated as continuous then whats the use of labelEncoder(). P.S. I know about one hot encoding. But there are around 100 hotel

Apply StandardScaler to parts of a data set

坚强是说给别人听的谎言 提交于 2020-12-28 06:54:06
问题 I want to use sklearn 's StandardScaler . Is it possible to apply it to some feature columns but not others? For instance, say my data is: data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]}) Age Name Weight 0 18 3 68 1 92 4 59 2 98 6 49 col_names = ['Name', 'Age', 'Weight'] features = data[col_names] I fit and transform the data scaler = StandardScaler().fit(features.values) features = scaler.transform(features.values) scaled_features = pd.DataFrame(features,

Apply StandardScaler to parts of a data set

无人久伴 提交于 2020-12-28 06:53:41
问题 I want to use sklearn 's StandardScaler . Is it possible to apply it to some feature columns but not others? For instance, say my data is: data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]}) Age Name Weight 0 18 3 68 1 92 4 59 2 98 6 49 col_names = ['Name', 'Age', 'Weight'] features = data[col_names] I fit and transform the data scaler = StandardScaler().fit(features.values) features = scaler.transform(features.values) scaled_features = pd.DataFrame(features,

Apply StandardScaler to parts of a data set

蹲街弑〆低调 提交于 2020-12-28 06:53:16
问题 I want to use sklearn 's StandardScaler . Is it possible to apply it to some feature columns but not others? For instance, say my data is: data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]}) Age Name Weight 0 18 3 68 1 92 4 59 2 98 6 49 col_names = ['Name', 'Age', 'Weight'] features = data[col_names] I fit and transform the data scaler = StandardScaler().fit(features.values) features = scaler.transform(features.values) scaled_features = pd.DataFrame(features,

Apply StandardScaler to parts of a data set

橙三吉。 提交于 2020-12-28 06:53:07
问题 I want to use sklearn 's StandardScaler . Is it possible to apply it to some feature columns but not others? For instance, say my data is: data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]}) Age Name Weight 0 18 3 68 1 92 4 59 2 98 6 49 col_names = ['Name', 'Age', 'Weight'] features = data[col_names] I fit and transform the data scaler = StandardScaler().fit(features.values) features = scaler.transform(features.values) scaled_features = pd.DataFrame(features,

Cannot understand with sklearn's PolynomialFeatures

回眸只為那壹抹淺笑 提交于 2020-12-27 19:09:11
问题 Need help in sklearn's Polynomial Features. It works quite well with one feature but whenever I add multiple features, it also outputs some values in the array besides the values raised to the power of the degrees. For ex: For this array, X=np.array([[230.1,37.8,69.2]]) when I try to X_poly=poly.fit_transform(X) It outputs [[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+01 5.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+03 2.61576000e+03 4.78864000e+03]] Here, what is 8

Cannot understand with sklearn's PolynomialFeatures

瘦欲@ 提交于 2020-12-27 18:51:12
问题 Need help in sklearn's Polynomial Features. It works quite well with one feature but whenever I add multiple features, it also outputs some values in the array besides the values raised to the power of the degrees. For ex: For this array, X=np.array([[230.1,37.8,69.2]]) when I try to X_poly=poly.fit_transform(X) It outputs [[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+01 5.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+03 2.61576000e+03 4.78864000e+03]] Here, what is 8