data.table | 易学教程

Join big dataframe in r and filter in the same time

阅读更多关于 Join big dataframe in r and filter in the same time

问题 df1 = data.frame(id=1,start=as.Date("2012-07-05"),end=as.Date("2012-07-15")) df2 = data.frame(id=rep(1,1371),date = as.Date(as.Date("2012-05-06"):as.Date("2016-02-05"))) output = dplyr::inner_join(x=df1,y=df2,by="id") %>% filter(date>=start & date<= end) I have two dataframes which have each one about one millions rows and I want to join them by id and then filter so that for each row, value of column date is comprised between value of startdate and enddate. An dplyr::inner_join is not

R sum by group if date within date range

阅读更多关于 R sum by group if date within date range

问题 Suppose I have two dataframes. The first one includes "Date" at which a "Name" issues a "Rec" for an "ID" and the "Stop.Date" at which "Rec" becomes invalid. df (only a part) structure(list(Date = structure(c(13236, 13363, 14074, 13199, 14554), class = "Date"), ID = c("AU0000XINAA9", "AU0000XINAA9", "AU0000XINAC5", "AU0000XINAI2", "AU0000XINAJ0"), Name = c("N+1 BREWIN", "N+1 BREWIN", "ARBUTHNOT SECURITIES LTD.", "INVESTEC BANK (UK) PLC", "AWRAQ INVESTMENTS"), Rec = c(1, 2, 2, 2, 1), Stop.Date

efficient conditional cross join in data table

阅读更多关于 efficient conditional cross join in data table

问题 EDITED (Sorry, I modified the code, it had an error that prevented reproduction.) I am trying to efficiently merge with a condition. The way I am doing it now is to cross-join (which I want to preserve) except I have one condition for a subset of the columns. Cross join function (from here) CJ.table.1 <- function(X,Y) setkey(X[,c(k=1,.SD)],k)[Y[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL] set.seed(1) #generate data x = data.table(t=rep(1:10,2), z=sample(1:10,20,replace=T)) x2 = data.table

Max by Group with Condition for a data.table

阅读更多关于 Max by Group with Condition for a data.table

问题 I have data like this: library(data.table) group <- c("a","a","a","b","b","b") cond <- c("N","Y","N","Y","Y","N") value <- c(2,1,3,4,2,5) dt <- data.table(group, cond, value) group cond value a N 2 a Y 1 a N 3 b Y 4 b Y 2 b N 5 I would like to return max value when the cond is Y for the entire group. Something like this: group cond value max a N 2 1 a Y 1 1 a N 3 1 b Y 4 4 b Y 2 4 b N 5 4 I've tried adding an ifelse condition to a grouped max, however, I end up just returning the no condition

Unlist column in data frame with listed

阅读更多关于 Unlist column in data frame with listed

问题 I have a list with multiple levels that I would like to the data level into a data frame, where the variable chr is collapsed into single strings. myList <- list(total_reach = list(4), data = list(list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company A"), list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company B"))) I would like to transform this into a data frame that looks like this: reach chr nr company 1 2 A, B, C 3 Company A 2 2 A, B, C 3 Company B Using

Unlist column in data frame with listed

阅读更多关于 Unlist column in data frame with listed

Unlist column in data frame with listed

阅读更多关于 Unlist column in data frame with listed

R data.table - new column with ':=' and keep existing column

阅读更多关于 R data.table - new column with ':=' and keep existing column

问题 Is it possible to create a new column and keep (few) existing columns in the statement ? e.g. creation of "x" column and then keeping "x" and "mpg" column dt <- data.table(mtcars) dt[,x:=mpg] dt[,.(x,mpg)] 回答1: If you want to do the replacement by reference, using := then you can do dt[, x:=mpg][, setdiff(colnames(dt), c('x', 'mpg')) := NULL] 回答2: If we need it in a single step, instead of doing the := to modify the original dataset, specify it with = inside list or .( dt[,.(x = mpg, mpg)] Or

Check if column contains value from a list and assign that value to new column

阅读更多关于 Check if column contains value from a list and assign that value to new column

问题 I have a list that contains patterns to find. Then I have a data.table in which I want to find if the value contains any ot the patterns then assign that value to a new column: library(data.table) library(stringr) base_patters <- c("pat1","pat2","pat3") transformations <- data.table(mynames = c("HI_pat1_jo","A2_a4_pat1_LN","pat3_LN") ) for( patt in base_patters){ transformations[stringr::str_detect(transformations[, mynames], patt), pattern := patt] } I have solved (as you see) with a for

Check if column contains value from a list and assign that value to new column

阅读更多关于 Check if column contains value from a list and assign that value to new column