Getting an error “(subscript) logical subscript too long” while training SVM from e1071 package in R

╄→尐↘猪︶ㄣ 提交于 2019-12-05 12:35:54

问题


I am training svm using my traindata. (e1071 package in R). Following is the information about my data.

> str(train)
'data.frame':   891 obs. of  10 variables:
$ survived: int  0 1 1 1 0 0 0 0 1 1 ...
$ pclass  : int  3 1 3 1 3 3 1 3 3 2 ...
$ name    : Factor w/ 15 levels "capt","col","countess",..: 12 13 9 13 12 12 12 8 13 13 
$ sex     : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ age     : num  22 38 26 35 35 ...
$ ticket  : Factor w/ 533 levels "110152","110413",..: 516 522 531 50 473 276 86 396 
$ fare    : num  7.25 71.28 7.92 53.1 8.05 ...
$ cabin   : Factor w/ 9 levels "a","b","c","d",..: 9 3 9 3 9 9 5 9 9 9 ...
$ embarked: Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
$ family  : int  1 1 0 1 0 0 0 4 2 1 ...

I train it as the following.

library(e1071)
model1 <- svm(survived~.,data=train, type="C-classification")

No problem here. But when I predict as:

pred <- predict(model1,test)

I get the following error:

Error in newdata[, object$scaled, drop = FALSE] : 
(subscript) logical subscript too long

I also tried removing "ticket" predictor from both train and test data. But still same error. What is the problem?


回答1:


There might a difference in the number of levels in one of the factors in 'test' dataset.

run str(test) and check that the factor variables have the same levels as corresponding variables in the 'train' dataset.

ie the example below shows my.test$foo only has 4 levels.....

str(my.train)
'data.frame':   554 obs. of  7 variables:
 ....
 $ foo: Factor w/ 5 levels "C","Q","S","X","Z": 2 2 4 3 4 4 4 4 4 4 ...

str(my.test)
'data.frame':   200 obs. of  7 variables:
 ...
 $ foo: Factor w/ 4 levels "C","Q","S","X": 3 3 3 3 1 3 3 3 3 3 ...



回答2:


Thats correct train data contains 2 blanks for embarked because of this there is one extra categorical value for blanks and you are getting this error

$ Embarked : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...

The first is blank




回答3:


I encountered the same problem today. It turned out that the svm model in e1071 package can only use rows as the objects, which means one row is one sample, rather than column. If you use column as the sample and row as the variable, this error will occur.




回答4:


I have been playing with that data set as well. I know this was a long time ago, but one of the things you can do is explicitly include only the columns you feel will add to the model, like such:

fit <- svm(Survived~Pclass + Sex + Age + SibSp + Parch + Fare + Embarked, data=train)

This eliminated the problem for me by eliminating columns that contribute nothing (like ticket number) which have no relevant data.



来源:https://stackoverflow.com/questions/17109219/getting-an-error-subscript-logical-subscript-too-long-while-training-svm-fro

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!