data.table

filter rows by a function over values of each row, data.table

陌路散爱 提交于 2020-07-20 10:44:11
问题 Switch from data.frame syntax to data.table syntax is still not smooth for me. I thought the following thing should be trivial, but no. What I am doing wrong here: > DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9) > DT x y v 1: a 1 1 2: a 3 2 3: a 6 3 4: b 1 4 5: b 3 5 6: b 6 6 7: c 1 7 8: c 3 8 9: c 6 9 I want something like this: cols = c("y", "v") # a vector of column names or indexes DT[rowSums(cols) > 5] # Take only rows where # values at colums y and v satisfy a

Count the months between two dates in a data.table

喜夏-厌秋 提交于 2020-07-19 18:24:31
问题 I have a data.table like the following: ID start_date end_date 1 2015.01.01 2016.02.01 2 2015.06.01 2016.03.01 3 2016.01.01 2017.01.01 I would like to get the following: ID start_date end_date Months_passed 1 2015.01.01 2016.02.01 13 2 2015.06.01 2016.03.01 9 3 2016.01.01 2017.01.01 12 I was trying the following code: DT[, Months_passed:= length(seq(from = start_date, to = end_date, by='month')) - 1] but I get the error, that "Error in seq.Date(from = start_date, to = end_date, by = "month")

R data.table weird value/reference semantics

℡╲_俬逩灬. 提交于 2020-07-18 08:59:26
问题 (This is a follow up question to this.) Check this toy code: > x <- data.frame(a = 1:2) > foo <- function(z) { setDT(z) ; z[, b:=3:4] ; z } > y <- foo(x) > > class(x) [1] "data.table" "data.frame" > x a 1: 1 2: 2 It looks like setDT did change x's class, but the addition of data did not apply to x. What happened here? 回答1: In your function z is a reference to x until setDT . library(data.table) foo <- function(z) {print(address(z)); setDT(z); print(address(z))} x <- data.frame(a = 1:2)

Wide to long: multiple columns, two timepoints, two groups

别说谁变了你拦得住时间么 提交于 2020-07-18 08:06:09
问题 I have searched and found a number of examples, so far I have not been able to solve a problem in transforming my data from wide to long. Below is an example of the data: set.seed(12345) id = 1:100 age = sample(1:100, 100, replace=TRUE) group = sample(1:2, 100, replace=TRUE) t0_var1 = sample(1:300, 100, replace=TRUE) t2_var1 = sample(1:300, 100, replace=TRUE) t0_var2 = sample(1:600, 100, replace=TRUE) t2_var2 = sample(1:600, 100, replace=TRUE) t0_var3 = sample(1:700, 100, replace=TRUE) t2

Is R data.table documented to pass by reference as argument?

笑着哭i 提交于 2020-07-10 03:10:07
问题 Check this toy code: > x <- data.table(a = 1:2) > foo <- function(z) { z[, b:=3:4] } > y <- foo(x) > x[] a b 1: 1 3 2: 2 4 It seems data.table is passed by reference. Is this intentional? Is this documented? I did read through the docs and couldn't find a mention of this behaviour. I'm not asking about R's documented reference semantics (in := , set*** and some others). I'm asking whether a data.table complete object is supposed to be passed by reference as a function argument. Edit:

Rollapply over data.table rows with subset calculations in function

跟風遠走 提交于 2020-07-09 13:09:42
问题 I want to rollapply a function on a data.table. And in the function I would like to work with the data.table subset, so that the example below works. library(zoo) library(data.table) dt <- data.table(i = 1:100, x = sample(1:10, 100, replace = T), y = sample(1:10, 100, replace = T)) rollapply(dt, width=10, FUN = function(dt_slice) dt_slice[, mean(x == y)]) 回答1: You can use rollapply , or sapply / outer , to get a matrix of indices and then apply over that matrix with the operation you want

Rollapply over data.table rows with subset calculations in function

℡╲_俬逩灬. 提交于 2020-07-09 13:09:35
问题 I want to rollapply a function on a data.table. And in the function I would like to work with the data.table subset, so that the example below works. library(zoo) library(data.table) dt <- data.table(i = 1:100, x = sample(1:10, 100, replace = T), y = sample(1:10, 100, replace = T)) rollapply(dt, width=10, FUN = function(dt_slice) dt_slice[, mean(x == y)]) 回答1: You can use rollapply , or sapply / outer , to get a matrix of indices and then apply over that matrix with the operation you want

Rollapply over data.table rows with subset calculations in function

社会主义新天地 提交于 2020-07-09 13:08:11
问题 I want to rollapply a function on a data.table. And in the function I would like to work with the data.table subset, so that the example below works. library(zoo) library(data.table) dt <- data.table(i = 1:100, x = sample(1:10, 100, replace = T), y = sample(1:10, 100, replace = T)) rollapply(dt, width=10, FUN = function(dt_slice) dt_slice[, mean(x == y)]) 回答1: You can use rollapply , or sapply / outer , to get a matrix of indices and then apply over that matrix with the operation you want

Evaluating function arguments to pass to data.table

﹥>﹥吖頭↗ 提交于 2020-07-09 13:01:21
问题 I have this piece of code that I'd like to wrap in a function indata <- data.frame(id = c(1L, 2L, 3L, 4L, 12L, 13L, 14L, 15L), fid = c(NA, 9L, 1L, 1L, 7L, 5L, 5L, 5L), mid = c(0L, NA, 2L, 2L, 6L, 6L, 6L, 8L)) library(data.table) DT <- as.data.table(indata) DT[, msib:=.(list(id)), by = mid][ ,msibs := mapply(setdiff, msib, id)][ ,fsib := .(list(id)), by = fid][ ,fsibs := mapply(setdiff, fsib, id)][ ,siblist := mapply(union, msibs, fsibs)][ ,c("msib","msibs", "fsib", "fsibs") := NULL] So far so

Evaluating function arguments to pass to data.table

主宰稳场 提交于 2020-07-09 13:00:06
问题 I have this piece of code that I'd like to wrap in a function indata <- data.frame(id = c(1L, 2L, 3L, 4L, 12L, 13L, 14L, 15L), fid = c(NA, 9L, 1L, 1L, 7L, 5L, 5L, 5L), mid = c(0L, NA, 2L, 2L, 6L, 6L, 6L, 8L)) library(data.table) DT <- as.data.table(indata) DT[, msib:=.(list(id)), by = mid][ ,msibs := mapply(setdiff, msib, id)][ ,fsib := .(list(id)), by = fid][ ,fsibs := mapply(setdiff, fsib, id)][ ,siblist := mapply(union, msibs, fsibs)][ ,c("msib","msibs", "fsib", "fsibs") := NULL] So far so