data.table | 易学教程

Don't want original data.table to be modified when passed to a function

阅读更多关于 Don't want original data.table to be modified when passed to a function

问题 I am a fan of data.table , as of writing re-usable functions for all current and future needs. Here's a challenge I run into while working on the answer to this problem: Best way to plot automatically all data.table columns using ggplot2 We pass data.table to a function for plotting and then the original data.table gets modified, even though we made a copy of it to prevent that. Here's a simple code to illustrate: plotYofX <- function(.dt,x,y) { dt <- .dt dt[, (c(x,y)) := lapply(.SD, function

Don't want original data.table to be modified when passed to a function

阅读更多关于 Don't want original data.table to be modified when passed to a function

Joining tables based on different column names

阅读更多关于 Joining tables based on different column names

问题 I was watching a video[1] by Greg Reda about Pandas to see what Pandas can do how it compares with data.table. I was surprised to learn how difficult it was to join tables in data.table. If you watch the video, specifically @49:00 to @52:00 minutes you see that Pandas allows you to join tables based on different column names and you can choose different suffixes for left and right tables. I understand that setkey is used for optimizaion purposes[2] and understand how to join tables using same

Calculate the difference between consecutive, grouped columns in a data.table

阅读更多关于 Calculate the difference between consecutive, grouped columns in a data.table

问题 My data is structured as follows: DT <- data.table(Id=c(1,2,3,4,5), Va1=c(3,13,NA,NA,NA), Va2=c(4,40,NA,NA,4), Va3=c(5,34,NA,7,84), Va4=c(2,23,NA,63,9), Vb1=c(8,45,1,7,0), Vb2=c(0,35,0,7,6), Vb3=c(63,0,0,0,5), Vc1=c(2,5,0,0,4)) >DT Id Va1 Va2 Va3 Va4 Vb1 Vb2 Vb3 Vc1 1: 1 3 4 5 2 8 0 63 2 2: 2 13 40 34 23 45 35 0 5 3: 3 NA NA NA NA 1 0 0 0 4: 4 NA NA 7 63 7 7 0 0 5: 5 NA 4 84 9 0 6 5 4 additionally, I have a reference list that references all the column groups: reference <- list(g.1=c(2,3,4,5)

fread from data.table package when column names include spaces and special characters?

阅读更多关于 fread from data.table package when column names include spaces and special characters?

问题 I have a csv file where column names include spaces and special characters. fread imports them with quotes - but how can I change this behaviour? One reason is that I have column names starting with a space and I don't know how to handle them. Any pointers would be helpful. Edit: An example. > packageVersion("data.table") [1] ‘1.8.8’ p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE) > head(p2p[,list(Principal remaining)]) Error: unexpected symbol in "head(p2p[,list(Principal

merge-like scenario with two data.tables

阅读更多关于 merge-like scenario with two data.tables

问题 I have two dataframes (actually data.tables). set.seed(123) dt1 <- data.table(P=rep(letters[1:3],c(4,2,3)),X=sample(9)) dt1 P X 1: a 3 2: a 7 3: a 9 4: a 6 5: b 5 6: b 1 7: c 2 8: c 8 9: c 4 and: dt2 <- data.table(P=rep(letters[1:5],length=10),D=c("X","Y","Z","G","F")) dt2 P D 1: a X 2: b Y 3: c Z 4: d G 5: e F 6: a X 7: b Y 8: c Z 9: d G 10: e F Now I want to add a new column to dt1, with column "D" of dt2 where P has the same value in dt1 and dt2. It should look like this: dt_new P X D 1: a

Ranged/Filtered Cross Join with R data.table

阅读更多关于 Ranged/Filtered Cross Join with R data.table

问题 I want to cross-join two data tables without evaluating the full cross join, using a ranging criterion in the process. In essence, I would like CJ with filtering/ranging expression. Can someone suggest a high performing approach avoiding the full cross join? See test example below doing the job with the evil full cross join. library(data.table) # Test data. dt1 <- data.table(id1=1:10, D=2*(1:10), key="id1") dt2 <- data.table(id2=21:23, D1=c(5, 7, 10), D2=c(9, 12, 16), key="id2") # Desired

Ranged/Filtered Cross Join with R data.table

阅读更多关于 Ranged/Filtered Cross Join with R data.table

Metaprogramming with ggplot2

阅读更多关于 Metaprogramming with ggplot2

问题 I've been trying to cut down on the amount of copying and pasting required to make a large number of charts with slightly differing functions / slices of the data. Here is a simplified example of what I am trying to do: test <- data.table(a=c("x","y"), b=seq(1,3), c=rnorm(18)) fixedSlices <- function(input, rowfacet, colfacet, metric){ calc <- substitute(metric) bygroup<-c(rowfacet,colfacet) aggregates <- input[,eval(calc),by=bygroup] ggplot(aggregates) + geom_point(stat="identity") + aes(x="

Extracting unique rows from a data table in R [duplicate]

阅读更多关于 Extracting unique rows from a data table in R [duplicate]

问题 This question already has answers here : Filtering out duplicated/non-unique rows in data.table (4 answers) Closed 2 years ago . I'm migrating from data frames and matrices to data tables, but haven't found a solution for extracting the unique rows from a data table. I presume there's something I'm missing about the [,J] notation, though I've not yet found an answer in the FAQ and intro vignettes. How can I extract the unique rows, without converting back to data frames? Here is an example: