data.table | 易学教程

R plyr, data.table, apply certain columns of data.frame

阅读更多关于 R plyr, data.table, apply certain columns of data.frame

问题 I am looking for ways to speed up my code. I am looking into the apply / ply methods as well as data.table . Unfortunately, I am running into problems. Here is a small sample data: ids1 <- c(1, 1, 1, 1, 2, 2, 2, 2) ids2 <- c(1, 2, 3, 4, 1, 2, 3, 4) chars1 <- c("aa", " bb ", "__cc__", "dd ", "__ee", NA,NA, "n/a") chars2 <- c("vv", "_ ww_", " xx ", "yy__", " zz", NA, "n/a", "n/a") data <- data.frame(col1 = ids1, col2 = ids2, col3 = chars1, col4 = chars2, stringsAsFactors = FALSE) Here is a

Improve performance of data.table date+time pasting?

阅读更多关于 Improve performance of data.table date+time pasting?

问题 I am not sure that I can ask this question here, let me know if I should do it somewhere else. I have a data.table with 1e6 rows having this structure: V1 V2 V3 1: 03/09/2011 08:05:40 1145.0 2: 03/09/2011 08:06:01 1207.3 3: 03/09/2011 08:06:17 1198.8 4: 03/09/2011 08:06:20 1158.4 5: 03/09/2011 08:06:40 1112.2 6: 03/09/2011 08:06:59 1199.3 I am converting the V1 and V2 variables to a unique datetime variable, using this code: system.time(DT[,`:=`(index= as.POSIXct(paste(V1,V2), format='%d/%m/

Improve performance of data.table date+time pasting?

阅读更多关于 Improve performance of data.table date+time pasting?

What's the higher-performance alternative to for-loops for subsetting data by group-id?

阅读更多关于 What's the higher-performance alternative to for-loops for subsetting data by group-id?

问题 A recurring analysis paradigm I encounter in my research is the need to subset based on all different group id values, performing statistical analysis on each group in turn, and putting the results in an output matrix for further processing/summarizing. How I typically do this in R is something like the following: data.mat <- read.csv("...") groupids <- unique(data.mat$ID) #Assume there are then 100 unique groups results <- matrix(rep("NA",300),ncol=3,nrow=100) for(i in 1:100) { tempmat <-

What's the higher-performance alternative to for-loops for subsetting data by group-id?

阅读更多关于 What's the higher-performance alternative to for-loops for subsetting data by group-id?

Row maximum in data table

阅读更多关于 Row maximum in data table

问题 I have a dataset of 8,000,000 rows with 100 columns in a data.table where each column is a count. I need to find the maximum count in each row and which column this maximum is in. I can quickly get which column has the maximum value for each row using dt <- dt[, maxCol := which.max(.SD), by=pmxid] but trying to get the actual maximum value using dt <- dt[, nmax := max(.SD), by=pmxid] is incredibly slow. I ran it for nearly 20 mins and only 200,000 row maximums had been calculated. Finding the

Get the last row of a previous group in data.table

阅读更多关于 Get the last row of a previous group in data.table

问题 This is what my data table looks like: library(data.table) dt <- fread(' Product Group LastProductOfPriorGroup A 1 NA B 1 NA C 2 B D 2 B E 2 B F 3 E G 3 E ') The LastProductOfPriorGroup column is my desired column. I am trying to fetch the product from last row of the prior group. So in the first two rows, there are no prior groups and therefore it is NA . In the third row, the product in the last row of the prior group 1 is B . I am trying to accomplish this by dt[,LastGroupProduct:= shift

Backward replacement of NAs in time series only to a limited number of observations

阅读更多关于 Backward replacement of NAs in time series only to a limited number of observations

问题 In a data table I want to perform a forward and backward gap-filling procedure over a period of 3 days in both directions. # Example data: library(data.table) library(zoo) dt <- data.table(Value = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.1359223, NA, NA, NA, NA, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, NA)) > dt Value 1: NA 2: NA 3: NA 4: NA 5: NA 6: NA 7: NA 8: NA 9: NA 10: 0.1359223 11: NA 12: NA 13: NA 14: NA 15: 0.0000000 16: 0.0000000 17: 0.0000000 18: 0.0000000 19: 0

Is There A Neat/Simplest Way To This data.table R Code?

阅读更多关于 Is There A Neat/Simplest Way To This data.table R Code?

问题 The STRATUM from OECD data is so long, for simplicity I put this name and would like to simplified it to a more short and precise naming as in the code below. pisaMas[,`:=` (SchoolType = c(ifelse(STRATUM == "National Secondary School", "Public", ifelse(STRATUM == "Religious School", "Religious", ifelse(STRATUM == "MOE Technical School", "Technical",0)))))] pisaMas[,table(SchoolType)] I would like to know if there are a simple way to this problems, using data.table package. 回答1: Current

Is There A Neat/Simplest Way To This data.table R Code?

阅读更多关于 Is There A Neat/Simplest Way To This data.table R Code?