data.table

How to find nearest highest value when join data.table

房东的猫 提交于 2021-02-08 03:29:20
问题 I have the following 2 data tables: DT1 <- data.table(A = c(100,50,10), B = c("Good","Ok","Bad")) DT1 A B 1: 100 Good 2: 50 Ok 3: 10 Bad and DT2 <- data.table(A = c(99,34,5,"",24,86)) DT2 A 1: 99 2: 34 3: 5 4: 5: 24 6: 86 What I would like to return when joining DT1 and DT2 is DT2 A B 1: 99 Good 2: 34 Ok 3: 5 Bad 4: NA 5: 24 Ok 6: 86 Good The "roll" option in data.table is only for "nearest" match so it doesnt work in my case. Is there any way I can do such lookup with data.table? 回答1: The

Remove grouping variable for data.table

无人久伴 提交于 2021-02-07 20:30:24
问题 I'd like to use data.table to do some wrangling and would like my resulting data table to not include the grouping variable. Here's a MWE: library("data.table") DT <- data.table(x = 1:10, grp = rep(1:2,5)) DT[, .(mmm = mean(x)), by = grp] This produces: grp mmm 1: 1 5 2: 2 6 which is all fine. However, I'd prefer the grp not to be here. This can be fixed by chaining the data.table calls and setting grp := NULL or just throwing the variable away, but can I prevent it in the first call so I

Remove grouping variable for data.table

陌路散爱 提交于 2021-02-07 20:30:22
问题 I'd like to use data.table to do some wrangling and would like my resulting data table to not include the grouping variable. Here's a MWE: library("data.table") DT <- data.table(x = 1:10, grp = rep(1:2,5)) DT[, .(mmm = mean(x)), by = grp] This produces: grp mmm 1: 1 5 2: 2 6 which is all fine. However, I'd prefer the grp not to be here. This can be fixed by chaining the data.table calls and setting grp := NULL or just throwing the variable away, but can I prevent it in the first call so I

subsetting a data.table based on a named list

若如初见. 提交于 2021-02-07 12:48:20
问题 I'm trying to subset a given data.table DT <- data.table( a = c(1:20), b = (3:4), c = (5:14), d = c(1:4) ) within a function by a parameter which is a named list param <- list(a = 1:10, b = 2:3, c = c(5, 7, 10)) I am maybe a bit stuck here but I certainly do not want implement something ugly like this. Especially since its not very dynamic. DT[(if (!is.null(param$a)) a %in% param$a else TRUE) & (if (!is.null(param$b)) b %in% param$b else TRUE) & (if (!is.null(param$c)) c %in% param$c else

data.table: group-by, sum, name new column, and slice columns in one step

末鹿安然 提交于 2021-02-07 09:39:47
问题 This seems like it should be easy, but I've never been able to figure out how to do it. Using data.table I want to sum a column, C , by another column A , and just keep those two columns. At the same time, I want to be able to name the new column. My attempts and desired output: library(data.table) dt <- data.table(A= c('a', 'b', 'b', 'c', 'c'), B=c('19', '20', '21', '22', '23'), C=c(150,250,20,220,130)) # Desired Output - is there a way to do this in one step using data.table? # new.data <-

data.table: group-by, sum, name new column, and slice columns in one step

跟風遠走 提交于 2021-02-07 09:39:35
问题 This seems like it should be easy, but I've never been able to figure out how to do it. Using data.table I want to sum a column, C , by another column A , and just keep those two columns. At the same time, I want to be able to name the new column. My attempts and desired output: library(data.table) dt <- data.table(A= c('a', 'b', 'b', 'c', 'c'), B=c('19', '20', '21', '22', '23'), C=c(150,250,20,220,130)) # Desired Output - is there a way to do this in one step using data.table? # new.data <-

R: Fuzzy merge using agrep and data.table

隐身守侯 提交于 2021-02-07 09:33:44
问题 I try to merge two data.tables, but due to different spelling in stock names I lose a substantial number of data points. Hence, instead of an exact match I was looking into a fuzzy merge. library("data.table") dt1 = data.table(Name = c("ASML HOLDING","ABN AMRO GROUP"), A = c(1,2)) dt2 = data.table(Name = c("ASML HOLDING NV", "ABN AMRO GROUP"), B = c("p", "q")) When merging dt1 and dt2 on "Name", ASML HOLDING will be excluded due to the addition of "NV", while the actual data would be accurate

data.table column order when using lapply and get

爱⌒轻易说出口 提交于 2021-02-07 06:12:09
问题 can someone help me understand why the two versions of the lapply operations below with and without using get() don't produce the same result? When using get() the result columns get mixed up. dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B')) v1 v2 type 1: 1 3 A 2: 2 4 B col_in <- c('v2', 'v1') col_out <- paste0(col_in, '.new') accessing 'type' the hard-coded way dt[, (col_out) := lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in] produces the expected result:

data.table column order when using lapply and get

隐身守侯 提交于 2021-02-07 06:12:01
问题 can someone help me understand why the two versions of the lapply operations below with and without using get() don't produce the same result? When using get() the result columns get mixed up. dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B')) v1 v2 type 1: 1 3 A 2: 2 4 B col_in <- c('v2', 'v1') col_out <- paste0(col_in, '.new') accessing 'type' the hard-coded way dt[, (col_out) := lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in] produces the expected result:

Fit model by group using Data.Table package

怎甘沉沦 提交于 2021-02-06 22:00:37
问题 How can I fit multiple models by group using data.table syntax? I want my output to be a data.frame with columns for each "by group" and one column for each model fit. Currently I am able to do this using the dplyr package, but can't do this in data.table. # example data frame df <- data.table( id = sample(c("id01", "id02", "id03"), N, TRUE), v1 = sample(5, N, TRUE), v2 = sample(round(runif(100, max = 100), 4), N, TRUE) ) # equivalent code in dplyr group_by(df, id) %>% do( model1= lm(v1 ~v2,