Speed up this loop to create dummy columns with data.table and set in R [duplicate]

一世执手 提交于 2019-12-01 06:31:30

Here's a different approach that, performs better - on my machine - than the original approach in the question

1) Get unique days except Sunday

Day <- setdiff(dt$Week_Day, "Sunday")

2) Initialize new columns with 0:

dt[, (Day) := 0L]

3) Update with 1s by reference in a loop:

for(x in Day) {
  set(dt, i = which(dt[["Week_Day"]] == x), j = x, value = 1L)
}

Simple performance comparison:

dt1 <- data.table(Week_Day = sample(c("Monday", "Tuesday", "Wednesday",
                              "Thursday", "Friday", "Saturday", "Sunday"), 3e5, TRUE))

dt2 <- copy(dt1)


system.time({
  Day <- setdiff(unique(dt$Week_Day), "Sunday")
  dt1[, (Day) := 0L]
  for(x in Day) {
    set(dt1, i = which(dt1[["Week_Day"]] == x), j = x, value = 1L)
  }
})
#       User      System verstrichen 
#      0.029       0.003       0.032 

system.time({
  Day <- unique(dt$Week_Day)
  for (i in 1:length(Day)) {
    if (Day[i] != "Sunday") {
      dt2[, Day[i] := ifelse(Week_Day == Day[i], 1L, 0L)]
    }
  }
})

#       User      System verstrichen 
#      0.138       0.070       0.210 


all.equal(dt1, dt2)
#[1] TRUE
lmo

Here is one attempt at a speed up:

Day <- unique(dt$Week_Day)
setkey(dt, Week_Day)

# create columns of 0s
dt[, (Day) := 0L]

for (i in seq_along(head(Day, -1))) {
     dt[Day[i], Day[i] := 1L]
}

This implements a couple of the data.table speed ups including binary search in the second chain and the elimination of ifelse with replacement by reference.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!