data.table | 易学教程

Mutating multiple columns dynamically while conditioning on specific rows

阅读更多关于 Mutating multiple columns dynamically while conditioning on specific rows

问题 I know there are several similar questions around here, but none of them seems to address the precise issue I'm having. set.seed(4) df = data.frame( Key = c("A", "B", "A", "D", "A"), Val1 = rnorm(5), Val2 = runif(5), Val3 = 1:5 ) I want to zeroise values of the value columns for the rows where Key == "A" The column names are referenced through a grep : cols = grep("Val", names(df), value = TRUE) Normally to achieve what I want in this case I would use data.table like this: library(data.table)

R data.table fread - read column as Date

阅读更多关于 R data.table fread - read column as Date

问题 I would like to read a file with fread from data.table that has a column with "YYYY-MM-DD" format dates. By default, fread reads the column as chr . However, I would like to have the column as Date , the same way I would obtain when applying as.Date . I have tried to use dt[,starttime.date := as.Date(starttime.date)] but it takes very long to run (I have approx. 43 million rows). 回答1: Using the fasttime package, as suggested in the fread documentation, is approximately 100x faster than as

Subset where there are at least five consecutive years in a data.frame column

阅读更多关于 Subset where there are at least five consecutive years in a data.frame column

问题 I have a data.frame / data.table in R as follows: df <- data.frame( ID = c(rep("A", 20)), year = c(1968, 1971, 1972, 1973, 1974, 1976, 1978, 1980, 1982, 1984, 1985, 1986, 1987, 1988, 1990, 1991, 1992, 1993, 1994, 1995) ) I'd like to subset the df in order to keep only those entries which have at least five consecutive years . In this example this is the case in two periods (1984:1988 and 1990:1995). How can I do this in R? 回答1: A compact solution using diff and cumsum : setDT(df)[, grp :=

R Data.Table Compare Groups Simulteanous

阅读更多关于 R Data.Table Compare Groups Simulteanous

问题 library(data.table) data = data.table("LABEL1" = c(1,1,1,2,2,2), "LABEL3" = c(1,2,3,1,2,3), "CAT"=runif(6), "FOX"=runif(6), "DOG"=runif(6), "MOUSE"=runif(6), "BIRD"=runif(6)) I wish to execute t-test for variables CAT:BIRD, these are proportions. I want to compare these groups: LABEL1=1 & LABEL3=2 gets compared to LABEL1=1 & LABEL3=1 LABEL1=1 & LABEL3=3 gets compared to LABEL1=1 & LABEL3=1 LABEL1=2 & LABEL3=2 gets compared to LABEL1=2 & LABEL3=1 LABEL1=2 & LABEL3=3 gets compared to LABEL1=2 &

R Configure Data With Data.Table

阅读更多关于 R Configure Data With Data.Table

问题 data=data.frame("Student"=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5), "Grade"=c(5,6,7,3,4,5,4,5,6,8,9,10,2,3,4), "Pass"=c(NA,0,1,0,1,1,0,1,0,0,NA,NA,0,0,0), "NEWPass"=c(0,0,1,0,1,1,0,1,1,0,0,0,0,0,0), "GradeNEWPass"=c(7,7,7,4,4,4,5,5,5,10,10,10,4,4,4), "GradeBeforeNEWPass"=c(6,6,6,3,3,3,4,4,4,10,10,10,4,4,4)) I have a data.frame called data. It has column names Student, Grade and Pass. I wish to do this: NEWPass: Take Pass and for every Student fill in NA values with the previous value. If the first

R Summarize Collapsed Data.Table

阅读更多关于 R Summarize Collapsed Data.Table

问题 I have data such as this data=data.table("School"=c(1,1,1,1,1,1,0,1,0,0,1,1,1,0,1,0,1,1,1,1,1,0,0,1,0,1,1,1,1,1,1,0,1,0,1,0), "Grade"=c(0,1,1,1,0,0,0,1,1,1,0,1,1,0,0,1,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,0,0,0,1,0), "CAT"=c(1,0,1,1,0,1,0,1,1,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,0,1,1,1,1), "FOX"=c(1,1,0,1,1,1,1,1,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,0,1,1,0,0,1,0,0,1,0,0,1,0), "DOG"=c(0,0,0,1,0,0,1,0,0,1,0,1,1,1,0,1,1,0,0,1,1,0,0,1,0,1,1,0,1,0,1,1,1,0,1,1)) and wish to achieve a new data table such

R Summarize Collapsed Data.Table

阅读更多关于 R Summarize Collapsed Data.Table

R Summarize Collapsed Data.Table

阅读更多关于 R Summarize Collapsed Data.Table

How to construct an edgeliste from a list of visited places (effectively)?

阅读更多关于 How to construct an edgeliste from a list of visited places (effectively)?

问题 My original data.table consists of three columns. site , observation_number and id . E.g. the following which is all the observations for id = z |site|observation_number|id |a | 1| z |b | 2| z |c | 3| z Which means that ID z has traveled from a to b to c . There is no fixed number of sites per id. I wish to transform the data to an edge list like this |from |to||id| |a | b| z | |b | c| z | mock data sox <- data.table(site = c('a','b','c','a','c','c','a','d','e'), obsnum =c(1,2,3,1,2,1,2,3,4),

complex data.table subset and vectorised maniulation

阅读更多关于 complex data.table subset and vectorised maniulation

问题 Ok I have a complex function built using data.frames and in trying to speed it up I've turned to data.table. I'm totally new to this so I'm quite befuddled. Anyhow I've made a much much simpler toy example of what I want to do, but I cannot work out how to translate it into data.table format. Here is the example in data.frame form: rows <- 10 data1 <- data.frame( id =1:rows, a = seq(0.2, 0.55, length.out = rows), b = seq(0.35, 0.7, length.out = rows), c = seq(0.4, 0.83, length.out = rows), d