feature-selection

Items of feature_columns must be a _FeatureColumn

心已入冬 提交于 2021-02-20 12:01:13
问题 I am getting this error: ValueError: Items of feature_columns must be a _FeatureColumn. Given (type ): Index(['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary', 'Exited'], dtype='object'). I am using tensorFlow lib. I want to get prediction results but I can not run m.train(input_fn=get_input_fn ,steps=5000) code. I always got the same error whatever I did. I used these input functions in the following but nothing changed. def input

Choosing an sklearn pipeline for classifying user text data

戏子无情 提交于 2021-02-19 08:15:52
问题 I'm working on a machine learning application in Python (using the sklearn module), and am currently trying to decide on a model for performing inference. A brief description of the problem: Given many instances of user data, I'm trying to classify them into various categories based on relative keyword containment. It is supervised, so I have many, many instances of pre-classified data that are already categorized. (Each piece of data is between 2 and 12 or so words.) I am currently trying to

Why does the standardscaler have different effects under different number of features

冷暖自知 提交于 2021-02-16 15:16:38
问题 I experimented with breast cancer data from scikit-learn. Use all features and not use standardscaler: cancer = datasets.load_breast_cancer() x = cancer.data y = cancer.target x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42) pla = Perceptron().fit(x_train, y_train) y_pred = pla.predict(x_test) print(accuracy_score(y_test, y_pred)) result 1 : 0.9473684210526315 Use all features and use standardscaler: cancer = datasets.load_breast_cancer() x = cancer

Optimal Feature Selection Technique after PCA?

旧城冷巷雨未停 提交于 2021-02-10 14:51:50
问题 I'm implementing a classification task with binary outcome using RandomForestClassifier and I know the importance of data preprocessing to improve the accuracy score. In particular, my dataset contains more than 100 features and almost 4000 instances and I want to perform a dimensionality reduction technique in order to avoid overfitting since there is an high presence of noise in the data. For these tasks I usually use a classical Feature Selection method (filters, wrappers, feature

Boruta box plots in R

若如初见. 提交于 2021-02-07 23:59:11
问题 I'm doing variable selection with the Boruta package in R. Boruta gives me the standard series of boxplots in a single graph, which is useful, but given the fact that I have too many predictors, I am hoping to be able to limit the number of boxplots that appear in the boruta plot. Something like the following image. Basicacly, I want to "zoom" on the right end of the plot, but have no idea how to do that with the boruta plot object. Thanks, MR 回答1: Sounds like an simple question, the solution

Boruta box plots in R

邮差的信 提交于 2021-02-07 23:57:47
问题 I'm doing variable selection with the Boruta package in R. Boruta gives me the standard series of boxplots in a single graph, which is useful, but given the fact that I have too many predictors, I am hoping to be able to limit the number of boxplots that appear in the boruta plot. Something like the following image. Basicacly, I want to "zoom" on the right end of the plot, but have no idea how to do that with the boruta plot object. Thanks, MR 回答1: Sounds like an simple question, the solution

Boruta box plots in R

女生的网名这么多〃 提交于 2021-02-07 23:57:11
问题 I'm doing variable selection with the Boruta package in R. Boruta gives me the standard series of boxplots in a single graph, which is useful, but given the fact that I have too many predictors, I am hoping to be able to limit the number of boxplots that appear in the boruta plot. Something like the following image. Basicacly, I want to "zoom" on the right end of the plot, but have no idea how to do that with the boruta plot object. Thanks, MR 回答1: Sounds like an simple question, the solution

Combining Recursive Feature Elimination and Grid Search in scikit-learn

醉酒当歌 提交于 2021-02-07 07:09:14
问题 I am trying to combine recursive feature elimination and grid search in scikit-learn. As you can see from the code below (which works), I am able to get the best estimator from a grid search and then pass that estimator to RFECV. However, I would rather do the RFECV first, then the grid search. The problem is that when I pass the selector ​from RFECV to the grid search, it does not take it: ValueError: Invalid parameter bootstrap for estimator RFECV Is it possible to get the selector from

Combining Recursive Feature Elimination and Grid Search in scikit-learn

醉酒当歌 提交于 2021-02-07 07:07:37
问题 I am trying to combine recursive feature elimination and grid search in scikit-learn. As you can see from the code below (which works), I am able to get the best estimator from a grid search and then pass that estimator to RFECV. However, I would rather do the RFECV first, then the grid search. The problem is that when I pass the selector ​from RFECV to the grid search, it does not take it: ValueError: Invalid parameter bootstrap for estimator RFECV Is it possible to get the selector from

Feature selection in document-feature matrix by using chi-squared test

送分小仙女□ 提交于 2021-02-06 12:50:43
问题 I am doing texting mining using natural language processing. I used quanteda package to generate a document-feature matrix (dfm). Now I want to do feature selection using a chi-square test. I know there were already a lot of people asked this question. However, I couldn't find the relevant code for that. (The answers just gave a brief concept, like this: https://stats.stackexchange.com/questions/93101/how-can-i-perform-a-chi-square-test-to-do-feature-selection-in-r) I learned that I could use