data.table | 易学教程

R data.table package and complex values

阅读更多关于 R data.table package and complex values

问题 I'm new to the data table package and so far its incredible! With one hitch... data.table does not seem to like complex numbers. For example, the code: DT <- data.table(x = as.complex(1:5)) DT[1] produces the error: Error in `[.data.table`(DT, 1) : Unknown column type 'complex' I've searched high and low, and unless I am being a colossal idiot I can't find any information on this, except for an obscure github edit: github Is this just a current limitation of the data.table package, or is it a

R data.table package and complex values

阅读更多关于 R data.table package and complex values

detecting sequence by group and compute new variable for the subset

阅读更多关于 detecting sequence by group and compute new variable for the subset

问题 I need to detect a sequence by group in a data.frame and compute new variable. Consider I have this following data.frame : df1 <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,3,3,3,3), seqs = c(1,2,3,4,5,6,7,1,2,3,1,2,3,4), count = c(2,1,3,1,1,2,3,1,2,1,3,1,4,1), product = c("A", "B", "C", "C", "A,B", "A,B,C", "D", "A", "B", "A", "A", "A,B,C", "D", "D"), stock = c("A", "A,B", "A,B,C", "A,B,C", "A,B,C", "A,B,C", "A,B,C,D", "A", "A,B", "A,B", "A", "A,B,C", "A,B,C,D", "A,B,C,D")) df1 > df1 ID seqs

R - Compare 2 matrices to find rows which rows aren't in both

阅读更多关于 R - Compare 2 matrices to find rows which rows aren't in both

问题 I have two large matrices in R of differing sizes, 371 x 1502 (A) and 371 x 1207 (B). All of matrix B is included in A. A also contains many other rows mixed in. I am looking for a way to create a new matrix, C, which contains all the rows in A not found in B. I am sure there is a way to do this using data.tables and keys but I can't for the life of me figure it out. example data: a = t(matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3)) b = t(matrix(c(1,2,3,7,8,9), nrow = 3)) Any help is appreciated,

Checking for a factor change by group

阅读更多关于 Checking for a factor change by group

问题 I have a data.table as follows: library(data.table) DT <- fread( "Event_Type country year A NLD 2005 B NLD 2004 A GBR 2006 B GBR 2003 A GRC 2002 A GRC 2007", header = TRUE) From this post, I know I can see if there is a change in event type as follows: ind <- with(DT, c(FALSE, Event_Type [-1L]!= Event_Type [-length(Event_Type )]) & Event_Type !='NULL') DT$switch <- ifelse(ind, 1, '') But I would like to be able to do this by group as well, in this case the country . How can I do this? 回答1: If

Fast way to find min in groups after excluding observations using R

阅读更多关于 Fast way to find min in groups after excluding observations using R

问题 I need to do something similar to below on a very large data set (with many groups), and read somewhere that using .SD is slow. Is there any faster way to perform the following operation? To be more precise, I need to create a new column that contains the min value for each group after having excluded a subset of observations in that group (something similar to minif in Excel). library(data.table) dt <- data.table(valid = c(0,1,1,0,1), a = c(1,1,2,3,4), groups = c("A", "A", "A", "B", "B")) dt

how can I eliminate a loop over a datatable? [closed]

阅读更多关于 how can I eliminate a loop over a datatable? [closed]

问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 6 years ago . I've two data.table s as shown below: N = 10 A.DT <- data.table(a1 = c(rnorm(N,0,1)), a2 = NA)) B.DT <- data.table(b1 = c(rnorm(N,0,1)), b2 = 1:N) setkey(A.DT,a1) setkey(B.DT,b1) I tried to change my previous

data.table reference semantics: memory usage of iterating through all columns

阅读更多关于 data.table reference semantics: memory usage of iterating through all columns

问题 When iterating through all columns in an R data.table using reference semantics, what makes more sense from a memory usage standpoint: (1) dt[, (all_cols) := lapply(.SD, my_fun)] or (2) lapply(colnames(dt), function(col) dt[, (col) := my_fun(dt[[col]])])[[1]] My question is: In (2), I am forcing data.table to overwrite dt on a column by column basis, so I would assume to need extra memory on the order of column size. Is this also the case for (1)? Or is all of lapply(.SD, my_fun) evaluated

Melting/Splitting a row into two rows, using two column values in the original row, leaving the rest intact

阅读更多关于 Melting/Splitting a row into two rows, using two column values in the original row, leaving the rest intact

问题 I have a data.table as follows: DT <- fread( "ID country year Event_A Event_B 4 NLD 2002 0 1 5 NLD 2002 0 1 6 NLD 2006 1 1 7 NLD 2006 1 0 8 NLD 2006 1 1 9 GBR 2002 0 1 10 GBR 2002 0 0 11 GBR 2002 0 1 12 GBR 2006 1 1 13 GBR 2006 1 1", header = TRUE) I want to cast the event columns over the row without summing them, creating new rows. I tried: meltedsessions <- melt(Exp, id.vars = -c(Event_A", "Event_B"), measure.vars = c("Event_A", "Event_B")) I need to specify id.vars as a negative because

How to extract first n rows per group and calculate function using that subset?

阅读更多关于 How to extract first n rows per group and calculate function using that subset?

问题 My question is very similar to this one: How to extract the first n rows per group? dt date age name val 1: 2000-01-01 3 Andrew 93.73546 2: 2000-01-01 4 Ben 101.83643 3: 2000-01-01 5 Charlie 91.64371 4: 2000-01-02 6 Adam 115.95281 5: 2000-01-02 7 Bob 103.29508 6: 2000-01-02 8 Campbell 91.79532 We have a dt and I've added an extra column named val . First, we want to extract the first n rows within each group. The solutions from the link provided are: dt[, .SD[1:2], by=date] # where 1:2 is the