data.table

R data.table package and complex values

纵饮孤独 提交于 2020-01-05 13:02:43
问题 I'm new to the data table package and so far its incredible! With one hitch... data.table does not seem to like complex numbers. For example, the code: DT <- data.table(x = as.complex(1:5)) DT[1] produces the error: Error in `[.data.table`(DT, 1) : Unknown column type 'complex' I've searched high and low, and unless I am being a colossal idiot I can't find any information on this, except for an obscure github edit: github Is this just a current limitation of the data.table package, or is it a

R data.table package and complex values

不问归期 提交于 2020-01-05 13:01:28
问题 I'm new to the data table package and so far its incredible! With one hitch... data.table does not seem to like complex numbers. For example, the code: DT <- data.table(x = as.complex(1:5)) DT[1] produces the error: Error in `[.data.table`(DT, 1) : Unknown column type 'complex' I've searched high and low, and unless I am being a colossal idiot I can't find any information on this, except for an obscure github edit: github Is this just a current limitation of the data.table package, or is it a

detecting sequence by group and compute new variable for the subset

我们两清 提交于 2020-01-05 10:32:35
问题 I need to detect a sequence by group in a data.frame and compute new variable. Consider I have this following data.frame : df1 <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,3,3,3,3), seqs = c(1,2,3,4,5,6,7,1,2,3,1,2,3,4), count = c(2,1,3,1,1,2,3,1,2,1,3,1,4,1), product = c("A", "B", "C", "C", "A,B", "A,B,C", "D", "A", "B", "A", "A", "A,B,C", "D", "D"), stock = c("A", "A,B", "A,B,C", "A,B,C", "A,B,C", "A,B,C", "A,B,C,D", "A", "A,B", "A,B", "A", "A,B,C", "A,B,C,D", "A,B,C,D")) df1 > df1 ID seqs

R - Compare 2 matrices to find rows which rows aren't in both

∥☆過路亽.° 提交于 2020-01-05 08:21:23
问题 I have two large matrices in R of differing sizes, 371 x 1502 (A) and 371 x 1207 (B). All of matrix B is included in A. A also contains many other rows mixed in. I am looking for a way to create a new matrix, C, which contains all the rows in A not found in B. I am sure there is a way to do this using data.tables and keys but I can't for the life of me figure it out. example data: a = t(matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3)) b = t(matrix(c(1,2,3,7,8,9), nrow = 3)) Any help is appreciated,

Checking for a factor change by group

我怕爱的太早我们不能终老 提交于 2020-01-05 07:23:31
问题 I have a data.table as follows: library(data.table) DT <- fread( "Event_Type country year A NLD 2005 B NLD 2004 A GBR 2006 B GBR 2003 A GRC 2002 A GRC 2007", header = TRUE) From this post, I know I can see if there is a change in event type as follows: ind <- with(DT, c(FALSE, Event_Type [-1L]!= Event_Type [-length(Event_Type )]) & Event_Type !='NULL') DT$switch <- ifelse(ind, 1, '') But I would like to be able to do this by group as well, in this case the country . How can I do this? 回答1: If

Fast way to find min in groups after excluding observations using R

故事扮演 提交于 2020-01-05 04:41:27
问题 I need to do something similar to below on a very large data set (with many groups), and read somewhere that using .SD is slow. Is there any faster way to perform the following operation? To be more precise, I need to create a new column that contains the min value for each group after having excluded a subset of observations in that group (something similar to minif in Excel). library(data.table) dt <- data.table(valid = c(0,1,1,0,1), a = c(1,1,2,3,4), groups = c("A", "A", "A", "B", "B")) dt

how can I eliminate a loop over a datatable? [closed]

Deadly 提交于 2020-01-05 04:24:05
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 6 years ago . I've two data.table s as shown below: N = 10 A.DT <- data.table(a1 = c(rnorm(N,0,1)), a2 = NA)) B.DT <- data.table(b1 = c(rnorm(N,0,1)), b2 = 1:N) setkey(A.DT,a1) setkey(B.DT,b1) I tried to change my previous

data.table reference semantics: memory usage of iterating through all columns

这一生的挚爱 提交于 2020-01-05 03:57:06
问题 When iterating through all columns in an R data.table using reference semantics, what makes more sense from a memory usage standpoint: (1) dt[, (all_cols) := lapply(.SD, my_fun)] or (2) lapply(colnames(dt), function(col) dt[, (col) := my_fun(dt[[col]])])[[1]] My question is: In (2), I am forcing data.table to overwrite dt on a column by column basis, so I would assume to need extra memory on the order of column size. Is this also the case for (1)? Or is all of lapply(.SD, my_fun) evaluated

Melting/Splitting a row into two rows, using two column values in the original row, leaving the rest intact

心不动则不痛 提交于 2020-01-05 03:48:06
问题 I have a data.table as follows: DT <- fread( "ID country year Event_A Event_B 4 NLD 2002 0 1 5 NLD 2002 0 1 6 NLD 2006 1 1 7 NLD 2006 1 0 8 NLD 2006 1 1 9 GBR 2002 0 1 10 GBR 2002 0 0 11 GBR 2002 0 1 12 GBR 2006 1 1 13 GBR 2006 1 1", header = TRUE) I want to cast the event columns over the row without summing them, creating new rows. I tried: meltedsessions <- melt(Exp, id.vars = -c(Event_A", "Event_B"), measure.vars = c("Event_A", "Event_B")) I need to specify id.vars as a negative because

How to extract first n rows per group and calculate function using that subset?

让人想犯罪 __ 提交于 2020-01-05 03:32:23
问题 My question is very similar to this one: How to extract the first n rows per group? dt date age name val 1: 2000-01-01 3 Andrew 93.73546 2: 2000-01-01 4 Ben 101.83643 3: 2000-01-01 5 Charlie 91.64371 4: 2000-01-02 6 Adam 115.95281 5: 2000-01-02 7 Bob 103.29508 6: 2000-01-02 8 Campbell 91.79532 We have a dt and I've added an extra column named val . First, we want to extract the first n rows within each group. The solutions from the link provided are: dt[, .SD[1:2], by=date] # where 1:2 is the