cross-validation

Custom cross validation split sklearn

我只是一个虾纸丫 提交于 2019-12-21 03:16:08
问题 I am trying to split a dataset for cross validation and GridSearch in sklearn. I want to define my own split but GridSearch only takes the built in cross-validation methods. However, I can't use the built in cross validation method because I need certain groups of examples to be in the same fold. So, if I have examples: [A1, A2, A3, A4, A5, B1, B2, B3, C1, C2, C3, C4, .... , Z1, Z2, Z3] I want to perform cross validation such that examples from each group [A,B,C...] only exist in one fold. ie

Classification report with Nested Cross Validation in SKlearn (Average/Individual values)

孤人 提交于 2019-12-20 20:41:58
问题 Is it possible to get classification report from cross_val_score through some workaround? I'm using nested cross-validation and I can get various scores here for a model, however, I would like to see the classification report of the outer loop. Any recommendations? # Choose cross-validation techniques for the inner and outer loops, # independently of the dataset. # E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc. inner_cv = KFold(n_splits=4, shuffle=True, random_state=i) outer_cv =

What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?

拜拜、爱过 提交于 2019-12-20 12:28:45
问题 I am confused about the difference between the cross_val_score scoring metric 'roc_auc' and the roc_auc_score that I can just import and call directly. The documentation (http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter) indicates that specifying scoring='roc_auc' will use the sklearn.metrics.roc_auc_score. However, when I implement GridSearchCV or cross_val_score with scoring='roc_auc' I receive very different numbers that when I call roc_auc_score directly.

How to perform k-fold cross validation with tensorflow?

蹲街弑〆低调 提交于 2019-12-20 09:24:57
问题 I am following the IRIS example of tensorflow. My case now is I have all data in a single CSV file, not separated, and I want to apply k-fold cross validation on that data. I have data_set = tf.contrib.learn.datasets.base.load_csv(filename="mydata.csv", target_dtype=np.int) How can I perform k-fold cross validation on this dataset with multi-layer neural network as same as IRIS example? 回答1: I know this question is old but in case someone is looking to do something similar, expanding on

how to implement walk forward testing in sklearn?

点点圈 提交于 2019-12-20 08:49:00
问题 In sklearn, GridSearchCV can take a pipeline as a parameter to find the best estimator through cross validation. However, the usual cross validation is like this: to cross validate a time series data, the training and testing data are often splitted like this: That is to say, the testing data should be always ahead of training data. My thought is: Write my own version class of k-fold and passing it to GridSearchCV so I can enjoy the convenience of pipeline. The problem is that it seems

How to perform random forest/cross validation in R

怎甘沉沦 提交于 2019-12-20 08:44:08
问题 I'm unable to find a way of performing cross validation on a regression random forest model that I'm trying to produce. So I have a dataset containing 1664 explanatory variables (different chemical properties), with one response variable (retention time). I'm trying to produce a regression random forest model in order to be able to predict the chemical properties of something given its retention time. ID RT (seconds) 1_MW 2_AMW 3_Sv 4_Se 4281 38 145.29 5.01 14.76 28.37 4952 40 132.19 6.29 11

Creating a table with individual trials from a frequency table in R (inverse of table function)

烈酒焚心 提交于 2019-12-19 06:32:18
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . I have a frequency table of data in a data.frame in R listing factor levels and counts of successes and failures. I would like to turn it from frequency table into a list of events - i.e. the opposite of the "table" command. Specifically, I would like to turn this: factor.A factor.B success.count fail.count -------- -------- ------------- ---------- 0 1 0 2 1 1 2 1 into this:

Creating a table with individual trials from a frequency table in R (inverse of table function)

╄→尐↘猪︶ㄣ 提交于 2019-12-19 06:31:36
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . I have a frequency table of data in a data.frame in R listing factor levels and counts of successes and failures. I would like to turn it from frequency table into a list of events - i.e. the opposite of the "table" command. Specifically, I would like to turn this: factor.A factor.B success.count fail.count -------- -------- ------------- ---------- 0 1 0 2 1 1 2 1 into this:

R: Cross validation on a dataset with factors

三世轮回 提交于 2019-12-18 11:56:09
问题 Often, I want to run a cross validation on a dataset which contains some factor variables and after running for a while, the cross validation routine fails with the error: factor x has new levels Y . For example, using package boot: library(boot) d <- data.frame(x=c('A', 'A', 'B', 'B', 'C', 'C'), y=c(1, 2, 3, 4, 5, 6)) m <- glm(y ~ x, data=d) m.cv <- cv.glm(d, m, K=2) # Sometimes succeeds m.cv <- cv.glm(d, m, K=2) # Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =

How to use the a k-fold cross validation in scikit with naive bayes classifier and NLTK

情到浓时终转凉″ 提交于 2019-12-17 22:14:38
问题 I have a small corpus and I want to calculate the accuracy of naive Bayes classifier using 10-fold cross validation, how can do it. 回答1: Your options are to either set this up yourself or use something like NLTK-Trainer since NLTK doesn't directly support cross-validation for machine learning algorithms. I'd recommend probably just using another module to do this for you but if you really want to write your own code you could do something like the following. Supposing you want 10-fold , you