data.table

How to sort a data.table using vector of multiple columns

社会主义新天地 提交于 2020-07-08 12:40:55
问题 I am pretty new to R and trying to build a function to compare two data set, in order to do that I need to sort data table on multiple columns.I am sure there will be some help somewhere but I am not sure how to search for it. This is my approach so far: DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9) #column vector keycol <- c("x","y") DT[order(keycol)] x y v 1: b 1 1 2: b 3 2 Somehow It displays just 2 rows and removes other records.But if I do this: > DT[order(x,y)] x y v

How to sort a data.table using vector of multiple columns

前提是你 提交于 2020-07-08 12:40:15
问题 I am pretty new to R and trying to build a function to compare two data set, in order to do that I need to sort data table on multiple columns.I am sure there will be some help somewhere but I am not sure how to search for it. This is my approach so far: DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9) #column vector keycol <- c("x","y") DT[order(keycol)] x y v 1: b 1 1 2: b 3 2 Somehow It displays just 2 rows and removes other records.But if I do this: > DT[order(x,y)] x y v

How to sort a data.table using vector of multiple columns

百般思念 提交于 2020-07-08 12:40:14
问题 I am pretty new to R and trying to build a function to compare two data set, in order to do that I need to sort data table on multiple columns.I am sure there will be some help somewhere but I am not sure how to search for it. This is my approach so far: DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9) #column vector keycol <- c("x","y") DT[order(keycol)] x y v 1: b 1 1 2: b 3 2 Somehow It displays just 2 rows and removes other records.But if I do this: > DT[order(x,y)] x y v

Order data.table by a character vector of column names

浪尽此生 提交于 2020-07-08 05:27:07
问题 I'd like to order a data.table by a variable holding the name of a column: I've tried every combination of + eval , get and c` without success: I have colVar = "someColumnName" I'd like to apply this to: DT[order(colVar)] 回答1: You can use double brackets for data tables: library(data.table) dtbl <- data.table(x = 1:5, y = 5:1) colVar = "y" dtbl_sorted <- dtbl[order(dtbl[[colVar]])] dtbl_sorted 回答2: data.table has special functions for that matter which will modify your data set by reference

data.table | faster row-wise recursive update within group

你。 提交于 2020-07-04 05:42:43
问题 I have to do the following recursive row-by-row operation to obtain z : myfun = function (xb, a, b) { z = NULL for (t in 1:length(xb)) { if (t >= 2) { a[t] = b[t-1] + xb[t] } z[t] = rnorm(1, mean = a[t]) b[t] = a[t] + z[t] } return(z) } set.seed(1) n_smpl = 1e6 ni = 5 id = rep(1:n_smpl, each = ni) smpl = data.table(id) smpl[, time := 1:.N, by = id] a_init = 1; b_init = 1 smpl[, ':=' (a = a_init, b = b_init)] smpl[, xb := (1:.N)*id, by = id] smpl[, z := myfun(xb, a, b), by = id] I would like

data.table | faster row-wise recursive update within group

非 Y 不嫁゛ 提交于 2020-07-04 05:40:05
问题 I have to do the following recursive row-by-row operation to obtain z : myfun = function (xb, a, b) { z = NULL for (t in 1:length(xb)) { if (t >= 2) { a[t] = b[t-1] + xb[t] } z[t] = rnorm(1, mean = a[t]) b[t] = a[t] + z[t] } return(z) } set.seed(1) n_smpl = 1e6 ni = 5 id = rep(1:n_smpl, each = ni) smpl = data.table(id) smpl[, time := 1:.N, by = id] a_init = 1; b_init = 1 smpl[, ':=' (a = a_init, b = b_init)] smpl[, xb := (1:.N)*id, by = id] smpl[, z := myfun(xb, a, b), by = id] I would like

Use a character vector in the `by` argument

心不动则不痛 提交于 2020-07-03 03:24:21
问题 Within the data.table package in R, is there a way in order to use a character vector to be assigned within the by argument of the calculation? Here is an example of what would be the desired output from this using mtcars: mtcars <- data.table(mtcars) ColSelect <- 'cyl' # One Column Option mtcars[,.( AveMpg = mean(mpg)), by = .(ColSelect)] # Doesn't work # Desired Output cyl AveMpg 1: 6 19.74286 2: 4 26.66364 3: 8 15.10000 I know that this is possible to use assigning column names in j by

Use a character vector in the `by` argument

时光怂恿深爱的人放手 提交于 2020-07-03 03:24:09
问题 Within the data.table package in R, is there a way in order to use a character vector to be assigned within the by argument of the calculation? Here is an example of what would be the desired output from this using mtcars: mtcars <- data.table(mtcars) ColSelect <- 'cyl' # One Column Option mtcars[,.( AveMpg = mean(mpg)), by = .(ColSelect)] # Doesn't work # Desired Output cyl AveMpg 1: 6 19.74286 2: 4 26.66364 3: 8 15.10000 I know that this is possible to use assigning column names in j by

Shorten nested ifelse

梦想的初衷 提交于 2020-07-02 20:32:15
问题 If the following data table is given, and we would like to compare x1 consequently with x2 to x5, the following can be used: set.seed(1) library(data.table) TDT <- data.table(x1 = round(rnorm(100,0.75,0.3),2), x2 = round(rnorm(100,0.75,0.3),2), x3 = round(rnorm(100,0.75,0.3),2), x4 = round(rnorm(100,0.75,0.3),2), x5 = round(rnorm(100,0.75,0.3),2)) TDT[,compare := ifelse(x1 < x2,1,ifelse(x1 < x3,2,ifelse(x1 < x4,3,ifelse(x1 < x5,4,5))))] So if x1 < x2 , then compare == 1 , etc. Now in my

R data.table filtering on group size

我只是一个虾纸丫 提交于 2020-06-28 09:03:28
问题 I am trying to find all the records in my data.table for which there is more than one row with value v in field f . For instance, we can use this data: dt <- data.table(f1=c(1,2,3,4,5), f2=c(1,1,2,3,3)) If looking for that property in field f2 , we'd get (note the absence of the (3,2) tuple) f1 f2 1: 1 1 2: 2 1 3: 4 3 4: 5 3 My first guess was dt[.N>2,list(.N),by=f2] , but that actually keeps entries with .N==1 . dt[.N>2,list(.N),by=f2] f2 N 1: 1 2 2: 2 1 3: 3 2 The other easy guess, dt