cross-validation | 易学教程

Why do I get different values with pipline and without pipline in sklearn in python

阅读更多关于 Why do I get different values with pipline and without pipline in sklearn in python

I am using recursive feature elimination with cross-validation (rfecv) with GridSearchCV with RandomForest classifier as follows using pipeline and without using pipeline . My code with pipeline is as follows. X = df[my_features_all] y = df['gold_standard'] #get development and testing sets x_train, x_test, y_train, y_test = train_test_split(X, y, random_state=0) from sklearn.pipeline import Pipeline #cross validation setting k_fold = StratifiedKFold(n_splits=5, shuffle=True, random_state=0) #this is the classifier used for feature selection clf_featr_sele = RandomForestClassifier(random_state

ValueError: continuous-multioutput is not supported

阅读更多关于 ValueError: continuous-multioutput is not supported

问题 I want to run several regression types (Lasso, Ridge, ElasticNet and SVR) on a dataset with around 5,000 rows and 6 features. Linear regression. Use GridSearchCV for cross validation. The code is extensive but here are some critical parts: def splitTrainTestAdv(df): y = df.iloc[:,-5:] # last 5 columns X = df.iloc[:,:-5] # Except for last 5 columns #Scaling and Sampling X = StandardScaler().fit_transform(X) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0

How to access Scikit Learn nested cross-validation scores

阅读更多关于 How to access Scikit Learn nested cross-validation scores

I'm using python and I would like to use nested cross-validation with scikit learn. I have found a very good example : NUM_TRIALS = 30 non_nested_scores = np.zeros(NUM_TRIALS) nested_scores = np.zeros(NUM_TRIALS) # Choose cross-validation techniques for the inner and outer loops, # independently of the dataset. # E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc. inner_cv = KFold(n_splits=4, shuffle=True, random_state=i) outer_cv = KFold(n_splits=4, shuffle=True, random_state=i) # Non_nested parameter search and scoring clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)

ValueError: continuous-multioutput is not supported

阅读更多关于 ValueError: continuous-multioutput is not supported

I want to run several regression types (Lasso, Ridge, ElasticNet and SVR) on a dataset with around 5,000 rows and 6 features. Linear regression. Use GridSearchCV for cross validation. The code is extensive but here are some critical parts: def splitTrainTestAdv(df): y = df.iloc[:,-5:] # last 5 columns X = df.iloc[:,:-5] # Except for last 5 columns #Scaling and Sampling X = StandardScaler().fit_transform(X) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0) return X_train, X_test, y_train, y_test def performSVR(x_train, y_train, X_test, parameter): C =

Why xgboost.cv and sklearn.cross_val_score give different results?

阅读更多关于 Why xgboost.cv and sklearn.cross_val_score give different results?

问题 I'm trying to make a classifier on a data set. I first used XGBoost: import xgboost as xgb import pandas as pd import numpy as np train = pd.read_csv("train_users_processed_onehot.csv") labels = train["Buy"].map({"Y":1, "N":0}) features = train.drop("Buy", axis=1) data_dmat = xgb.DMatrix(data=features, label=labels) params={"max_depth":5, "min_child_weight":2, "eta": 0.1, "subsamples":0.9, "colsample_bytree":0.8, "objective" : "binary:logistic", "eval_metric": "logloss"} rounds = 180 result =

Are the k-fold cross-validation scores from scikit-learn's `cross_val_score` and `GridsearchCV` biased if we include transformers in the pipeline?

阅读更多关于 Are the k-fold cross-validation scores from scikit-learn's `cross_val_score` and `GridsearchCV` biased if we include transformers in the pipeline?

问题 Data pre-processers such as StandardScaler should be used to fit_transform the train set and only transform (not fit) the test set. I expect the same fit/transform process applies to cross-validation for tuning the model. However, I found cross_val_score and GridSearchCV fit_transform the entire train set with the preprocessor (rather than fit_transform the inner_train set, and transform the inner_validation set). I believe this artificially removes the variance from the inner_validation set

XgBoost : The least populated class in y has only 1 members, which is too few

阅读更多关于 XgBoost : The least populated class in y has only 1 members, which is too few

问题 Im using Xgboost implementation on sklearn for a kaggle's competition. However, im getting this 'warning' message : $ python Script1.py /home/sky/private/virtualenv15.0.1dev/myVE/local/lib/python2.7/site-packages/sklearn/cross_validation.py:516: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=3. % (min_labels, self.n_folds)), Warning) According to another question on stackoverflow : "Check

Fitting in nested cross-validation with cross_val_score with pipeline and GridSearch

阅读更多关于 Fitting in nested cross-validation with cross_val_score with pipeline and GridSearch

I am working in scikit and I am trying to tune my XGBoost. I made an attempt to use a nested cross-validation using the pipeline for the rescaling of the training folds (to avoid data leakage and overfitting) and in parallel with GridSearchCV for param tuning and cross_val_score to get the roc_auc score at the end. from imblearn.pipeline import Pipeline from sklearn.model_selection import RepeatedKFold from sklearn.model_selection import GridSearchCV from sklearn.model_selection import cross_val_score from xgboost import XGBClassifier std_scaling = StandardScaler() algo = XGBClassifier() steps

Example of 10-fold cross-validation with Neural network classification in MATLAB

阅读更多关于 Example of 10-fold cross-validation with Neural network classification in MATLAB

I am looking for an example of applying 10-fold cross-validation in neural network.I need something link answer of this question: Example of 10-fold SVM classification in MATLAB I would like to classify all 3 classes while in the example only two classes were considered. Edit: here is the code I wrote for iris example load fisheriris %# load iris dataset k=10; cvFolds = crossvalind('Kfold', species, k); %# get indices of 10-fold CV net = feedforwardnet(10); for i = 1:k %# for each fold testIdx = (cvFolds == i); %# get indices of test instances trainIdx = ~testIdx; %# get indices training

Labeling one class for cross validation in libsvm matlab

阅读更多关于 Labeling one class for cross validation in libsvm matlab

问题 I want to use one-class classification using LibSVM in MATLAB. I want to train data and use cross validation, but I don't know what I have to do to label the outliers. If for example I have this data: trainData = [1,1,1; 1,1,2; 1,1,1.5; 1,1.5,1; 20,2,3; 2,20,2; 2,20,5; 20,2,2]; labelTrainData = [-1 -1 -1 -1 0 0 0 0]; (The first four are examples of the 1 class, the other four are examples of outliers, just for the cross validation) And I train the model using this: model = svmtrain