data.table | 易学教程

How to find nearest highest value when join data.table

阅读更多关于 How to find nearest highest value when join data.table

问题 I have the following 2 data tables: DT1 <- data.table(A = c(100,50,10), B = c("Good","Ok","Bad")) DT1 A B 1: 100 Good 2: 50 Ok 3: 10 Bad and DT2 <- data.table(A = c(99,34,5,"",24,86)) DT2 A 1: 99 2: 34 3: 5 4: 5: 24 6: 86 What I would like to return when joining DT1 and DT2 is DT2 A B 1: 99 Good 2: 34 Ok 3: 5 Bad 4: NA 5: 24 Ok 6: 86 Good The "roll" option in data.table is only for "nearest" match so it doesnt work in my case. Is there any way I can do such lookup with data.table? 回答1: The

Remove grouping variable for data.table

阅读更多关于 Remove grouping variable for data.table

问题 I'd like to use data.table to do some wrangling and would like my resulting data table to not include the grouping variable. Here's a MWE: library("data.table") DT <- data.table(x = 1:10, grp = rep(1:2,5)) DT[, .(mmm = mean(x)), by = grp] This produces: grp mmm 1: 1 5 2: 2 6 which is all fine. However, I'd prefer the grp not to be here. This can be fixed by chaining the data.table calls and setting grp := NULL or just throwing the variable away, but can I prevent it in the first call so I

Remove grouping variable for data.table

阅读更多关于 Remove grouping variable for data.table

subsetting a data.table based on a named list

阅读更多关于 subsetting a data.table based on a named list

问题 I'm trying to subset a given data.table DT <- data.table( a = c(1:20), b = (3:4), c = (5:14), d = c(1:4) ) within a function by a parameter which is a named list param <- list(a = 1:10, b = 2:3, c = c(5, 7, 10)) I am maybe a bit stuck here but I certainly do not want implement something ugly like this. Especially since its not very dynamic. DT[(if (!is.null(param$a)) a %in% param$a else TRUE) & (if (!is.null(param$b)) b %in% param$b else TRUE) & (if (!is.null(param$c)) c %in% param$c else

data.table: group-by, sum, name new column, and slice columns in one step

阅读更多关于 data.table: group-by, sum, name new column, and slice columns in one step

问题 This seems like it should be easy, but I've never been able to figure out how to do it. Using data.table I want to sum a column, C , by another column A , and just keep those two columns. At the same time, I want to be able to name the new column. My attempts and desired output: library(data.table) dt <- data.table(A= c('a', 'b', 'b', 'c', 'c'), B=c('19', '20', '21', '22', '23'), C=c(150,250,20,220,130)) # Desired Output - is there a way to do this in one step using data.table? # new.data <-

data.table: group-by, sum, name new column, and slice columns in one step

阅读更多关于 data.table: group-by, sum, name new column, and slice columns in one step

R: Fuzzy merge using agrep and data.table

阅读更多关于 R: Fuzzy merge using agrep and data.table

问题 I try to merge two data.tables, but due to different spelling in stock names I lose a substantial number of data points. Hence, instead of an exact match I was looking into a fuzzy merge. library("data.table") dt1 = data.table(Name = c("ASML HOLDING","ABN AMRO GROUP"), A = c(1,2)) dt2 = data.table(Name = c("ASML HOLDING NV", "ABN AMRO GROUP"), B = c("p", "q")) When merging dt1 and dt2 on "Name", ASML HOLDING will be excluded due to the addition of "NV", while the actual data would be accurate

data.table column order when using lapply and get

阅读更多关于 data.table column order when using lapply and get

问题 can someone help me understand why the two versions of the lapply operations below with and without using get() don't produce the same result? When using get() the result columns get mixed up. dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B')) v1 v2 type 1: 1 3 A 2: 2 4 B col_in <- c('v2', 'v1') col_out <- paste0(col_in, '.new') accessing 'type' the hard-coded way dt[, (col_out) := lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in] produces the expected result:

data.table column order when using lapply and get

阅读更多关于 data.table column order when using lapply and get

Fit model by group using Data.Table package

阅读更多关于 Fit model by group using Data.Table package

问题 How can I fit multiple models by group using data.table syntax? I want my output to be a data.frame with columns for each "by group" and one column for each model fit. Currently I am able to do this using the dplyr package, but can't do this in data.table. # example data frame df <- data.table( id = sample(c("id01", "id02", "id03"), N, TRUE), v1 = sample(5, N, TRUE), v2 = sample(round(runif(100, max = 100), 4), N, TRUE) ) # equivalent code in dplyr group_by(df, id) %>% do( model1= lm(v1 ~v2,