sklearn TimeSeriesSplit Error: KeyError: '[ 0 1 2 …] not in index'

点点圈 提交于 2019-12-08 09:50:55

问题


I want to use TimeSeriesSplit from sklearn on the following dataframe to predict sum:

So to prepare X and y I do the following:

X = df.drop(['sum'],axis=1)
y = df['sum']

and then feed these two to:

for train_index, test_index in tscv.split(X):
X_train01, X_test01 = X[train_index], X[test_index]
y_train01, y_test01 = y[train_index], y[test_index]

by doing so, I get the following error:

KeyError: '[ 0  1  2 ...] not in index'

Here X is a dataframe, and apparently this cause the error, because if I convert X to an array as following:

X = X.values

Then it will work. However, for later evaluation of the model I need X as a dataframe. Is there any way that I can keep X as a dataframe and feed it to tscv without converting it to an array?


回答1:


As @Jarad rightly said, if you have updated version of pandas, it will not automatically switch to integer based indexing as was possible in previous versions. You need to explicitly use .iloc for integer based slicing.

for train_index, test_index in tscv.split(X):
    X_train01, X_test01 = X.iloc[train_index], X.iloc[test_index]
    y_train01, y_test01 = y.iloc[train_index], y.iloc[test_index]

See https://pandas.pydata.org/pandas-docs/stable/indexing.html



来源:https://stackoverflow.com/questions/51597507/sklearn-timeseriessplit-error-keyerror-0-1-2-not-in-index

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!