Creating a data partition using caret and data.table

一笑奈何 提交于 2020-01-11 05:32:05

问题


I have a data.table in R which I want to use with caret package

set.seed(42)
trainingRows<-createDataPartition(DT$variable, p=0.75, list=FALSE)
head(trainingRows) # view the samples of row numbers

However, I am not able to select the rows with data.table. Instead I had to convert to a data.frame

DT_df <-as.data.frame(DT)
DT_train<-DT_df[trainingRows,]
dim(DT_train)

the data.table alternative

DT_train <- DT[.(trainingRows),] requires the keys to be set.

Any better option other than converting to data.frame?


回答1:


Roll you own

inTrain <- sample(MyDT[, .I], floor(MyDT[, .N] * .75))
Train <- MyDT[inTrain]
Test <- MyDT[-inTrain]

Or with Caret function you can just wrap trainingRows with a c().

 trainingRows<-createDataPartition(DT$variable, p=0.75, list=FALSE)
 Train <- DT[c(trainingRows)]
 Test <- DT[c(-trainingRows)]

===

Edit by Matt
Was going to add a comment, but too long.

I've seen sample(.I,...) being used quite a bit recently. This is inefficient because it has to create the (potentially very long) .I vector which is just 1:nrow(DT). This is such a common case that R's sample() doesn't need you to pass that vector. Just pass the length. sample(nrow(DT)) already returns exactly the same result without having to create .I. See ?sample.

Also, it's better to avoid variable name repetition wherever possible. More background here.

So instead of :

inTrain <- sample(MyDT[, .I], floor(MyDT[, .N] * .75))

I'd do :

inTrain <- MyDT[,sample(.N, floor(.N*.75))]



回答2:


The reason is that createDataPartition produces integer vector with two dimensions where the second could be losslessly dropped.
You can simply reduce dimension of trainingRows using below:

DT[trainingRows[,1]]

The c() function from Bruce Pucci's answer will reduce dimension too.

This minor difference vs. data.frame was spotted long time ago and recently I've made PR #1275 to fill that gap.



来源:https://stackoverflow.com/questions/32509803/creating-a-data-partition-using-caret-and-data-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!