cross-validation

Why do I get different values with pipline and without pipline in sklearn in python

ぃ、小莉子 提交于 2019-12-02 02:29:12
I am using recursive feature elimination with cross-validation (rfecv) with GridSearchCV with RandomForest classifier as follows using pipeline and without using pipeline . My code with pipeline is as follows. X = df[my_features_all] y = df['gold_standard'] #get development and testing sets x_train, x_test, y_train, y_test = train_test_split(X, y, random_state=0) from sklearn.pipeline import Pipeline #cross validation setting k_fold = StratifiedKFold(n_splits=5, shuffle=True, random_state=0) #this is the classifier used for feature selection clf_featr_sele = RandomForestClassifier(random_state

ValueError: continuous-multioutput is not supported

断了今生、忘了曾经 提交于 2019-12-01 23:33:02
问题 I want to run several regression types (Lasso, Ridge, ElasticNet and SVR) on a dataset with around 5,000 rows and 6 features. Linear regression. Use GridSearchCV for cross validation. The code is extensive but here are some critical parts: def splitTrainTestAdv(df): y = df.iloc[:,-5:] # last 5 columns X = df.iloc[:,:-5] # Except for last 5 columns #Scaling and Sampling X = StandardScaler().fit_transform(X) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0

How to access Scikit Learn nested cross-validation scores

ぃ、小莉子 提交于 2019-12-01 22:27:05
I'm using python and I would like to use nested cross-validation with scikit learn. I have found a very good example : NUM_TRIALS = 30 non_nested_scores = np.zeros(NUM_TRIALS) nested_scores = np.zeros(NUM_TRIALS) # Choose cross-validation techniques for the inner and outer loops, # independently of the dataset. # E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc. inner_cv = KFold(n_splits=4, shuffle=True, random_state=i) outer_cv = KFold(n_splits=4, shuffle=True, random_state=i) # Non_nested parameter search and scoring clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)

ValueError: continuous-multioutput is not supported

寵の児 提交于 2019-12-01 22:07:30
I want to run several regression types (Lasso, Ridge, ElasticNet and SVR) on a dataset with around 5,000 rows and 6 features. Linear regression. Use GridSearchCV for cross validation. The code is extensive but here are some critical parts: def splitTrainTestAdv(df): y = df.iloc[:,-5:] # last 5 columns X = df.iloc[:,:-5] # Except for last 5 columns #Scaling and Sampling X = StandardScaler().fit_transform(X) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0) return X_train, X_test, y_train, y_test def performSVR(x_train, y_train, X_test, parameter): C =

Why xgboost.cv and sklearn.cross_val_score give different results?

会有一股神秘感。 提交于 2019-12-01 21:25:15
问题 I'm trying to make a classifier on a data set. I first used XGBoost: import xgboost as xgb import pandas as pd import numpy as np train = pd.read_csv("train_users_processed_onehot.csv") labels = train["Buy"].map({"Y":1, "N":0}) features = train.drop("Buy", axis=1) data_dmat = xgb.DMatrix(data=features, label=labels) params={"max_depth":5, "min_child_weight":2, "eta": 0.1, "subsamples":0.9, "colsample_bytree":0.8, "objective" : "binary:logistic", "eval_metric": "logloss"} rounds = 180 result =

Are the k-fold cross-validation scores from scikit-learn's `cross_val_score` and `GridsearchCV` biased if we include transformers in the pipeline?

无人久伴 提交于 2019-12-01 18:44:52
问题 Data pre-processers such as StandardScaler should be used to fit_transform the train set and only transform (not fit) the test set. I expect the same fit/transform process applies to cross-validation for tuning the model. However, I found cross_val_score and GridSearchCV fit_transform the entire train set with the preprocessor (rather than fit_transform the inner_train set, and transform the inner_validation set). I believe this artificially removes the variance from the inner_validation set

XgBoost : The least populated class in y has only 1 members, which is too few

若如初见. 提交于 2019-12-01 17:54:21
问题 Im using Xgboost implementation on sklearn for a kaggle's competition. However, im getting this 'warning' message : $ python Script1.py /home/sky/private/virtualenv15.0.1dev/myVE/local/lib/python2.7/site-packages/sklearn/cross_validation.py:516: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=3. % (min_labels, self.n_folds)), Warning) According to another question on stackoverflow : "Check

Fitting in nested cross-validation with cross_val_score with pipeline and GridSearch

房东的猫 提交于 2019-12-01 12:38:17
I am working in scikit and I am trying to tune my XGBoost. I made an attempt to use a nested cross-validation using the pipeline for the rescaling of the training folds (to avoid data leakage and overfitting) and in parallel with GridSearchCV for param tuning and cross_val_score to get the roc_auc score at the end. from imblearn.pipeline import Pipeline from sklearn.model_selection import RepeatedKFold from sklearn.model_selection import GridSearchCV from sklearn.model_selection import cross_val_score from xgboost import XGBClassifier std_scaling = StandardScaler() algo = XGBClassifier() steps

Example of 10-fold cross-validation with Neural network classification in MATLAB

馋奶兔 提交于 2019-12-01 11:10:27
I am looking for an example of applying 10-fold cross-validation in neural network.I need something link answer of this question: Example of 10-fold SVM classification in MATLAB I would like to classify all 3 classes while in the example only two classes were considered. Edit: here is the code I wrote for iris example load fisheriris %# load iris dataset k=10; cvFolds = crossvalind('Kfold', species, k); %# get indices of 10-fold CV net = feedforwardnet(10); for i = 1:k %# for each fold testIdx = (cvFolds == i); %# get indices of test instances trainIdx = ~testIdx; %# get indices training

Labeling one class for cross validation in libsvm matlab

此生再无相见时 提交于 2019-12-01 10:47:28
问题 I want to use one-class classification using LibSVM in MATLAB. I want to train data and use cross validation, but I don't know what I have to do to label the outliers. If for example I have this data: trainData = [1,1,1; 1,1,2; 1,1,1.5; 1,1.5,1; 20,2,3; 2,20,2; 2,20,5; 20,2,2]; labelTrainData = [-1 -1 -1 -1 0 0 0 0]; (The first four are examples of the 1 class, the other four are examples of outliers, just for the cross validation) And I train the model using this: model = svmtrain