data.table

Join big dataframe in r and filter in the same time

笑着哭i 提交于 2021-01-28 14:30:35
问题 df1 = data.frame(id=1,start=as.Date("2012-07-05"),end=as.Date("2012-07-15")) df2 = data.frame(id=rep(1,1371),date = as.Date(as.Date("2012-05-06"):as.Date("2016-02-05"))) output = dplyr::inner_join(x=df1,y=df2,by="id") %>% filter(date>=start & date<= end) I have two dataframes which have each one about one millions rows and I want to join them by id and then filter so that for each row, value of column date is comprised between value of startdate and enddate. An dplyr::inner_join is not

R sum by group if date within date range

浪子不回头ぞ 提交于 2021-01-28 14:26:17
问题 Suppose I have two dataframes. The first one includes "Date" at which a "Name" issues a "Rec" for an "ID" and the "Stop.Date" at which "Rec" becomes invalid. df (only a part) structure(list(Date = structure(c(13236, 13363, 14074, 13199, 14554), class = "Date"), ID = c("AU0000XINAA9", "AU0000XINAA9", "AU0000XINAC5", "AU0000XINAI2", "AU0000XINAJ0"), Name = c("N+1 BREWIN", "N+1 BREWIN", "ARBUTHNOT SECURITIES LTD.", "INVESTEC BANK (UK) PLC", "AWRAQ INVESTMENTS"), Rec = c(1, 2, 2, 2, 1), Stop.Date

efficient conditional cross join in data table

笑着哭i 提交于 2021-01-28 14:11:42
问题 EDITED (Sorry, I modified the code, it had an error that prevented reproduction.) I am trying to efficiently merge with a condition. The way I am doing it now is to cross-join (which I want to preserve) except I have one condition for a subset of the columns. Cross join function (from here) CJ.table.1 <- function(X,Y) setkey(X[,c(k=1,.SD)],k)[Y[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL] set.seed(1) #generate data x = data.table(t=rep(1:10,2), z=sample(1:10,20,replace=T)) x2 = data.table

Max by Group with Condition for a data.table

感情迁移 提交于 2021-01-28 14:01:06
问题 I have data like this: library(data.table) group <- c("a","a","a","b","b","b") cond <- c("N","Y","N","Y","Y","N") value <- c(2,1,3,4,2,5) dt <- data.table(group, cond, value) group cond value a N 2 a Y 1 a N 3 b Y 4 b Y 2 b N 5 I would like to return max value when the cond is Y for the entire group. Something like this: group cond value max a N 2 1 a Y 1 1 a N 3 1 b Y 4 4 b Y 2 4 b N 5 4 I've tried adding an ifelse condition to a grouped max, however, I end up just returning the no condition

Unlist column in data frame with listed

霸气de小男生 提交于 2021-01-28 12:53:26
问题 I have a list with multiple levels that I would like to the data level into a data frame, where the variable chr is collapsed into single strings. myList <- list(total_reach = list(4), data = list(list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company A"), list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company B"))) I would like to transform this into a data frame that looks like this: reach chr nr company 1 2 A, B, C 3 Company A 2 2 A, B, C 3 Company B Using

Unlist column in data frame with listed

為{幸葍}努か 提交于 2021-01-28 12:52:11
问题 I have a list with multiple levels that I would like to the data level into a data frame, where the variable chr is collapsed into single strings. myList <- list(total_reach = list(4), data = list(list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company A"), list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company B"))) I would like to transform this into a data frame that looks like this: reach chr nr company 1 2 A, B, C 3 Company A 2 2 A, B, C 3 Company B Using

Unlist column in data frame with listed

怎甘沉沦 提交于 2021-01-28 12:50:33
问题 I have a list with multiple levels that I would like to the data level into a data frame, where the variable chr is collapsed into single strings. myList <- list(total_reach = list(4), data = list(list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company A"), list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company B"))) I would like to transform this into a data frame that looks like this: reach chr nr company 1 2 A, B, C 3 Company A 2 2 A, B, C 3 Company B Using

R data.table - new column with ':=' and keep existing column

为君一笑 提交于 2021-01-28 11:44:04
问题 Is it possible to create a new column and keep (few) existing columns in the statement ? e.g. creation of "x" column and then keeping "x" and "mpg" column dt <- data.table(mtcars) dt[,x:=mpg] dt[,.(x,mpg)] 回答1: If you want to do the replacement by reference, using := then you can do dt[, x:=mpg][, setdiff(colnames(dt), c('x', 'mpg')) := NULL] 回答2: If we need it in a single step, instead of doing the := to modify the original dataset, specify it with = inside list or .( dt[,.(x = mpg, mpg)] Or

Check if column contains value from a list and assign that value to new column

◇◆丶佛笑我妖孽 提交于 2021-01-28 09:49:11
问题 I have a list that contains patterns to find. Then I have a data.table in which I want to find if the value contains any ot the patterns then assign that value to a new column: library(data.table) library(stringr) base_patters <- c("pat1","pat2","pat3") transformations <- data.table(mynames = c("HI_pat1_jo","A2_a4_pat1_LN","pat3_LN") ) for( patt in base_patters){ transformations[stringr::str_detect(transformations[, mynames], patt), pattern := patt] } I have solved (as you see) with a for

Check if column contains value from a list and assign that value to new column

和自甴很熟 提交于 2021-01-28 09:42:34
问题 I have a list that contains patterns to find. Then I have a data.table in which I want to find if the value contains any ot the patterns then assign that value to a new column: library(data.table) library(stringr) base_patters <- c("pat1","pat2","pat3") transformations <- data.table(mynames = c("HI_pat1_jo","A2_a4_pat1_LN","pat3_LN") ) for( patt in base_patters){ transformations[stringr::str_detect(transformations[, mynames], patt), pattern := patt] } I have solved (as you see) with a for