R caret unusually slow when tuning SVM with linear kernel

家住魔仙堡 提交于 2021-02-07 18:31:28

问题


I have observed a very strange behavior when tuning SVM parameters with caret. When training a single model without tuning, SVM with radial basis kernel takes more time than SVM with linear kernel, which is expected. However, when tuning SVM with both kernels over the same penalty grid, SVM with linear kernel takes substantially more time than SVM with radial basis kernel. This behavior can be easily reproduced in both Windows and Linux with R 3.2 and caret 6.0-47. Does anyone know why tuning the linear SVM takes so much more time than the radial basis kernel SVM?

SVM linear
   user  system elapsed 
   0.51    0.00    0.52 

SVM radial
   user  system elapsed 
   0.85    0.00    0.84 

SVM linear tuning
   user  system elapsed 
 129.98    0.02  130.08 

SVM radial tuning
   user  system elapsed 
   2.44    0.05    2.48 

The toy example code is below:

library(data.table)
library(kernlab)
library(caret)

n <- 1000
p <- 10

dat <- data.table(y = as.factor(sample(c('p', 'n'), n, replace = T)))
dat[, (paste0('x', 1:p)) := lapply(1:p, function(x) rnorm(n, 0, 1))]
dat <- as.data.frame(dat)

sigmas <- sigest(as.matrix(dat[, -1]), na.action = na.omit, scaled = TRUE)
sigma  <- mean(as.vector(sigmas[-2]))

cat('\nSVM linear\n')
print(system.time(fit1 <- train(y ~ ., data = dat, method = 'svmLinear', tuneLength = 1,
                                 trControl = trainControl(method = 'cv', number = 3))))

cat('\nSVM radial\n')
print(system.time(fit2 <- train(y ~ ., data = dat, method = 'svmRadial', tuneLength = 1,
                                 trControl = trainControl(method = 'cv', number = 3))))

cat('\nSVM linear tuning\n')
print(system.time(fit3 <- train(y ~ ., data = dat, method = 'svmLinear',
                                 tuneGrid = expand.grid(C = 2 ^ seq(-5, 15, 5)),
                                 trControl = trainControl(method = 'cv', number = 3))))

cat('\nSVM radial tuning\n')
print(system.time(fit4 <- train(y ~ ., data = dat, method = 'svmRadial',
                                 tuneGrid = expand.grid(C = 2 ^ seq(-5, 15, 5), sigma = sigma),
                                 trControl = trainControl(method = 'cv', number = 3))))

回答1:


After taking a look I don't believe the issue is with caret, but rather with whats going on behind(way behind) the scenes with kernlab.
As has been stated elsewhere on stack overflow SVM itself is an intensive algorithm. The time complexity of SVM is O(n*n). Now this doesn't account for the difference between SVM calls. What does seems to be happening though is after the call to compiled C code through a very deep stack ending in SVM > .Local > .call. (.call being a call to compiled c code and out of my knowledge base). Most of the time when you see unexpected slow times moving from R to C its because how things are passed. Since your pulling in a matrix this lends itself further to the assumption of a naming or dimensions issue causing some extra work on the other end.
if we look at how this code is profiled the bottleneck becomes pretty clear.

Apologies about the font size -- its a deep stack and I think the overall shape tells the story more than the individual functions. Feel free to spam Ctrl + below.

nSVM_linear looks like a healthy profile and lots of friendly R functions.

nSVM_linear

Same deal for nSVM radial

enter image description here

Now once we start with 'radial tuning' we start to see the flatter structure with the try-call stacks starting to skew but everything seems to be executing quickly.

enter image description here

Whoa. Completely different structure for linear tuning C calls taking over 100 seconds in some cases.

enter image description here

So that being said, it looks like your bottleneck is in the compiled C code from kernlab. Since the package is connecting to libsvm which seems to be pretty efficient I can't imagine there an actual issue with the code being called. Actually identifying how(safety based feature or an input issue from R) and why the issue is occurring when moving from one to the other is a job for someone better than I.




回答2:


I ran into incredibly poor performance of svmRadial on Linux. It turns out that the issue was with using multicore DoMC. svmRadial runs fine on a single core. The kernlab functions are the only ones in caret that exhibit this behaviour that I've seen. One more issue to add for kernlab, in addition to those mentioned by others.



来源:https://stackoverflow.com/questions/30385347/r-caret-unusually-slow-when-tuning-svm-with-linear-kernel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!