feature-selection

Retain feature names after Scikit Feature Selection

痞子三分冷 提交于 2020-06-24 08:33:10
问题 After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I'm doing something simple yet stupid, but I'd like to retain the names of the remaining features. The following code: def VarianceThreshold_selector(data): selector = VarianceThreshold(.5) selector.fit(data) selector = (pd.DataFrame(selector.transform(data))) return selector x = VarianceThreshold_selector(data) print(x) changes the following data (this is just a small subset of the

Retain feature names after Scikit Feature Selection

前提是你 提交于 2020-06-24 08:33:01
问题 After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I'm doing something simple yet stupid, but I'd like to retain the names of the remaining features. The following code: def VarianceThreshold_selector(data): selector = VarianceThreshold(.5) selector.fit(data) selector = (pd.DataFrame(selector.transform(data))) return selector x = VarianceThreshold_selector(data) print(x) changes the following data (this is just a small subset of the

fill missing values (nan) by regression of other columns

青春壹個敷衍的年華 提交于 2020-06-17 05:29:03
问题 I've got a dataset containing a lot of missing values (NAN). I want to use linear or multilinear regression in python and fill all the missing values. You can find the dataset here: Dataset I have used f_regression(X_train, Y_train) to select which feature should I use. first of all I convert df['country'] to dummy then used important features then I have used regression but the results Not good. I have defined following functions to select features and missing values: def select_features

fill missing values (nan) by regression of other columns

烈酒焚心 提交于 2020-06-17 05:28:26
问题 I've got a dataset containing a lot of missing values (NAN). I want to use linear or multilinear regression in python and fill all the missing values. You can find the dataset here: Dataset I have used f_regression(X_train, Y_train) to select which feature should I use. first of all I convert df['country'] to dummy then used important features then I have used regression but the results Not good. I have defined following functions to select features and missing values: def select_features