sklearn Kfold acces single fold instead of for loop

杀马特。学长 韩版系。学妹 提交于 2019-12-31 20:54:08

问题


After using cross_validation.KFold(n, n_folds=folds) I would like to access the indexes for training and testing of single fold, instead of going through all the folds.

So let's take the example code:

from sklearn import cross_validation
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = cross_validation.KFold(4, n_folds=2)

>>> print(kf)  
sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False,
                           random_state=None)
>>> for train_index, test_index in kf:

I would like to access the first fold in kf like this (instead of for loop):

train_index, test_index in kf[0]

This should return just the first fold, but instead I get the error: "TypeError: 'KFold' object does not support indexing"

What I want as output:

>>> train_index, test_index in kf[0]
>>> print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [2 3] TEST: [0 1]

Link: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html

Question

How do I retrieve the indexes for train and test for only a single fold, without going through the whole for loop?


回答1:


You are on the right track. All you need to do now is:

kf = cross_validation.KFold(4, n_folds=2)
mylist = list(kf)
train, test = mylist[0]

kf is actually a generator, which doesn't compute the train-test split until it is needed. This improves memory usage, as you are not storing items you don't need. Making a list of the KFold object forces it to make all values available.

Here are two great SO question that explain what generators are: one and two


Edit Nov 2018

The API has changed since sklearn 0.20. An updated example (for py3.6):

from sklearn.model_selection import KFold
import numpy as np

kf = KFold(n_splits=4)

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])


X_train, X_test = next(kf.split(X))

In [12]: X_train
Out[12]: array([2, 3])

In [13]: X_test
Out[13]: array([0, 1])



回答2:


# We saved all the K Fold samples in different list  then we access to this throught [i]
from sklearn.model_selection import KFold
import numpy as np
import pandas as pd

kf = KFold(n_splits=4)

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])

Y = np.array([0,0,0,1])
Y=Y.reshape(4,1)

X=pd.DataFrame(X)
Y=pd.DataFrame(Y)


X_train_base=[]
X_test_base=[]
Y_train_base=[]
Y_test_base=[]

for train_index, test_index in kf.split(X):

    X_train, X_test = X.iloc[train_index,:], X.iloc[test_index,:]
    Y_train, Y_test = Y.iloc[train_index,:], Y.iloc[test_index,:]
    X_train_base.append(X_train)
    X_test_base.append(X_test)
    Y_train_base.append(Y_train)
    Y_test_base.append(Y_test)

print(X_train_base[0])
print(Y_train_base[0])
print(X_train_base[1])
print(Y_train_base[1])


来源:https://stackoverflow.com/questions/27380636/sklearn-kfold-acces-single-fold-instead-of-for-loop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!