data.table

Fast reading and combining several files using data.table (with fread)

好久不见. 提交于 2020-04-05 07:32:07
问题 I have several different txt files with the same structure. Now I want to read them into R using fread, and then union them into a bigger dataset. ## First put all file names into a list library(data.table) all.files <- list.files(path = "C:/Users",pattern = ".txt") ## Read data using fread readdata <- function(fn){ dt_temp <- fread(fn, sep=",") keycols <- c("ID", "date") setkeyv(dt_temp,keycols) # Notice there's a "v" after setkey with multiple keys return(dt_temp) } # then using mylist <-

Process optimisation of code within dopar

江枫思渺然 提交于 2020-03-26 04:03:54
问题 I am trying to optimize my code to run glms multiple times, and I would like to leverage parallelization, either with foreach or some other more efficient way. As you can see; the for loop takes about 800 secs to run 270000 glms; while foreach with dopar unintuitively takes for ever (It either crashes or I force it to stop after a couple of hours). Thanks for your help. Jinesh library(data.table) library(parallel) library(doParallel) library(foreach) scen_bin <- expand.grid(n = c(10, 20, 30),

Process optimisation of code within dopar

独自空忆成欢 提交于 2020-03-26 04:03:42
问题 I am trying to optimize my code to run glms multiple times, and I would like to leverage parallelization, either with foreach or some other more efficient way. As you can see; the for loop takes about 800 secs to run 270000 glms; while foreach with dopar unintuitively takes for ever (It either crashes or I force it to stop after a couple of hours). Thanks for your help. Jinesh library(data.table) library(parallel) library(doParallel) library(foreach) scen_bin <- expand.grid(n = c(10, 20, 30),

grouping table by multiple factors and spreading it from long format to wide - the data.table way in R

感情迁移 提交于 2020-03-25 13:41:36
问题 As an example i will be using the mtcars data available in R: data(mtcars) setDT(mtcars) Lets day I want to group the data by three variables, namely: carb , cyl , and gear . I have done this as follow. However, i am sure there is a better way, as this is quite repetitive. newDTcars <- mtcars [, mtcars[, mtcars[, .N , by = carb], by = cyl], by= gear] Secondly, I would like to have the data in a wide format, where there is a separate column for every gear level. For illustration purpose I have

Alternative to (m)get in data.table functions

余生长醉 提交于 2020-03-22 09:06:22
问题 Let's say I have the following data.table and would like to get the output below by referring to variables stored in a vector: dt <- data.table(a = rep(1, 3), b = rep(2, 3)) x <- 'a' y <- 'b' dt[, .(sum(get(x)), mean(get(y)))] V1 V2 1: 3 2 Cool, it works. But now I'd like to make a function, and then do something like: foo <- function(arg1, arg2) { dt[, .(sum(get(arg1)), mean(get(arg2)))] } foo(x, y) Realizing it works, I'd like to avoid calling all those gets , and do something like: foo <-

Alternative to (m)get in data.table functions

瘦欲@ 提交于 2020-03-22 09:05:29
问题 Let's say I have the following data.table and would like to get the output below by referring to variables stored in a vector: dt <- data.table(a = rep(1, 3), b = rep(2, 3)) x <- 'a' y <- 'b' dt[, .(sum(get(x)), mean(get(y)))] V1 V2 1: 3 2 Cool, it works. But now I'd like to make a function, and then do something like: foo <- function(arg1, arg2) { dt[, .(sum(get(arg1)), mean(get(arg2)))] } foo(x, y) Realizing it works, I'd like to avoid calling all those gets , and do something like: foo <-

Data.table: Apply function over groups with reference to set value in each group. Pass resulting columns into a function

会有一股神秘感。 提交于 2020-03-20 06:08:00
问题 I have data in a long format which will be grouped by geographies. I want to calculate the difference in each group between one of the variables of interest against all the other variables of interest. I could not figure out how to do this efficiently in a single data table statement so did a workaround which also introduced some new errors along the way (I fixed those with more workarounds but help here would also be appreciated!). I then want to pass the resulting columns into a ggplot

Recode a variable using data.table package

南楼画角 提交于 2020-03-17 02:57:25
问题 If I want to recode a variable in R using data.table , what is the syntax? I saw some ans but didn't find them appropriate. e.g. if I have the variable called gender I want to recode gender 0 to unknown, 1 to male, 2 to female: here is how I tried: Name <- c("John", "Tina", "Dave", "Casper") Gender <- c(1, 2, 2, 0) trips <- cbind.data.frame(Name, Gender) trips[, gender = ifelse(gender == 0, "Unkown", gender == 1, "Male", gender == 2, "Female" )] but I get an error 回答1: Once you have a data

How do I make my for loop properly calculate means over time?

大憨熊 提交于 2020-03-16 08:46:53
问题 I have data on all the NCAA basketball games that have occurred since 2003. I am trying to implement a for loop that will calculate the average of a number of stats for each time at a point in time. Here is my for loop: library(data.table) roll_season_team_stats <- NULL for (i in 0:max(stats_DT$DayNum)) { stats <- stats_DT[DayNum < i] roll_stats <- dcast(stats_DT, TeamID+Season~.,fun=mean,na.rm=T,value.var = c('FGM', 'FGA', 'FGM3', 'FGA3', 'FTM', 'FTA', 'OR', 'DR', 'TO')) roll_stats$DayNum <-

Dummyfication of a column/variable [duplicate]

久未见 提交于 2020-03-15 05:57:28
问题 This question already has answers here : Generate a dummy-variable (16 answers) Closed 2 years ago . I'm designing a neural Network in R. For that I have to prepare my data and have imported a table. For example: time hour Money day 1: 20000616 1 9.35 5 2: 20000616 2 6.22 5 3: 20000616 3 10.65 5 4: 20000616 4 11.42 5 5: 20000616 5 10.12 5 6: 20000616 6 7.32 5 Now I need a dummyfication. My final table should look like this: time Money day 1 2 3 4 5 6 1: 20000616 9.35 5 1 0 0 0 0 0 2: 20000616