I have a data.table in R which I want to use with caret package
set.seed(42)
trainingRows<-createDataPartition(DT$variable, p=0.75, list=FALSE)
head(train
Roll you own
inTrain <- sample(MyDT[, .I], floor(MyDT[, .N] * .75))
Train <- MyDT[inTrain]
Test <- MyDT[-inTrain]
Or with Caret function you can just wrap trainingRows with a c().
trainingRows<-createDataPartition(DT$variable, p=0.75, list=FALSE)
Train <- DT[c(trainingRows)]
Test <- DT[c(-trainingRows)]
===
Edit by Matt
Was going to add a comment, but too long.
I've seen sample(.I,...) being used quite a bit recently. This is inefficient because it has to create the (potentially very long) .I vector which is just 1:nrow(DT). This is such a common case that R's sample() doesn't need you to pass that vector. Just pass the length. sample(nrow(DT)) already returns exactly the same result without having to create .I. See ?sample.
Also, it's better to avoid variable name repetition wherever possible. More background here.
So instead of :
inTrain <- sample(MyDT[, .I], floor(MyDT[, .N] * .75))
I'd do :
inTrain <- MyDT[,sample(.N, floor(.N*.75))]