data.table

Increase for loop efficiency using vectorization in R

﹥>﹥吖頭↗ 提交于 2020-01-25 22:56:13
问题 I am still a relatively new user to R but am attempting to vectorize my for loops in order to increase computational speed. Currently, I need to perform two subsets on the data.frame/ data.table df (only named for this post): 1) subset by group id and 2) subset by each time interval interval for that particular id . I setup a nested for loop because I need to perform a homogeneity of variance test between each subset of data and a control using the levene.test in the lawstat package. The test

Bootstrapping multiple columns in data.table in a scalable fashion R

拜拜、爱过 提交于 2020-01-25 20:49:05
问题 This is a follow up question to this one. In the original question the OP wanted to perform bootstrap on two columns x1 and x2 that are fixed: set.seed(1000) data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5)) stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2))]} data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1 However, I think this problem can be nicely extended to handle any number of columns by treating them as groups. For instance, lets

How to remove duplicated values in uneven columns of a data.table?

此生再无相见时 提交于 2020-01-25 07:59:26
问题 I want to remove duplicated values in each coulmn of an uneven data.table. For instance, if the original data is (the real data table has many columns and rows): dt <- data.table(A = c("5p", "3p", "3p", "6y", NA), B = c("1c", "4r", "1c", NA, NA), C = c("4f", "5", "5", "5", "4m")) > dt A B C 1: 5p 1c 4f 2: 3p 4r 5 3: 3p 1c 5 4: 6y <NA> 5 5: <NA> <NA> 4m after removal of duplicated values in each column it should look like this: A B C 5p 1c 4f 3p 4r 5 NA NA NA 6y NA NA NA NA 4m I am trying a

Subset data.table based on value in column of type list

放肆的年华 提交于 2020-01-25 03:15:36
问题 So I have this case currently of a data.table with one column of type list. This list can contain different values, NULL among other possible values. I tried to subset the data.table to keep only rows for which this column has the value NULL . Behold... my attempts below (for the example I named the column "ColofTypeList"): DT[is.null(ColofTypeList)] It returns me an Empty data.table . Then I tried: DT[ColofTypeList == NULL] It returns the following error (I expected an error): Error in

Conditional merge, based an event happening between two panel observations

懵懂的女人 提交于 2020-01-24 23:58:52
问题 I have a panel dataset: panel and a dataset with a list of events: Events . For the panel dataset, an equal panelID shows that two observations belong together. panelID = c(1:50) year= c(2001:2010) country = c("NLD", "GRC", "GBR") n <- 2 library(data.table) set.seed(123) Panel <- data.table(panelID = rep(sample(panelID), each = n), country = rep(sample(country, length(panelID), replace = T), each = n), year = c(replicate(length(panelID), sample(year, n))), some_NA = sample(0:5, 6), some_NA

Conditional merge, based an event happening between two panel observations

点点圈 提交于 2020-01-24 23:58:06
问题 I have a panel dataset: panel and a dataset with a list of events: Events . For the panel dataset, an equal panelID shows that two observations belong together. panelID = c(1:50) year= c(2001:2010) country = c("NLD", "GRC", "GBR") n <- 2 library(data.table) set.seed(123) Panel <- data.table(panelID = rep(sample(panelID), each = n), country = rep(sample(country, length(panelID), replace = T), each = n), year = c(replicate(length(panelID), sample(year, n))), some_NA = sample(0:5, 6), some_NA

How to automatically interpolate values for one data frame based on another lookup table/data frame?

霸气de小男生 提交于 2020-01-24 18:00:22
问题 I have one data frame and one look up table. What I want is to compare df_dat$value with df_lookup$threshold . If the value falls into threshold range, then create a new column transfer in df_dat so that its values are linearly interpolated from the transfer column in df_lookup library(dplyr) df_lookup <- tribble( ~threshold, ~transfer, 0, 0, 100, 15, 200, 35 ) df_lookup #> # A tibble: 3 x 2 #> threshold transfer #> <dbl> <dbl> #> 1 0 0 #> 2 100 15 #> 3 200 35 df_dat <- tribble( ~date, ~value

R script: While using shift function in data.table - errror: (list) object cannot be coerced to type 'double'

*爱你&永不变心* 提交于 2020-01-24 17:09:08
问题 I have data.table set.seed(1) dat <- data.table(Shift = c(c(0,0,0,1,2,1,1)), Value = rnorm(7),I.Value = rnorm(7)) dat Shift Value I.Value 0 -0.6264538 0.7383247 0 0.1836433 0.5757814 0 -0.8356286 -0.3053884 1 1.5952808 1.5117812 2 0.3295078 0.3898432 1 -0.8204684 -0.6212406 1 0.4874291 -2.2146999 I want the new column to be shift(Value,Shift,fill=0). Hence the result should be- Shift Value I.Value new.value new.I.value 0 -0.6264538 0.7383247 -0.6264538 0.7383247 0 0.1836433 0.5757814 0

best way to manipulate strings in big data.table

痞子三分冷 提交于 2020-01-23 09:19:25
问题 I have a 67MM row data.table with people names and surname separated by spaces. I just need to create a new column for each word. Here is an small subset of the data: n <- structure(list(Subscription_Id = c("13.855.231.846.091.000", "11.156.048.529.090.800", "24.940.584.090.830", "242.753.039.111.124", "27.843.782.090.830", "13.773.513.145.090.800", "25.691.374.090.830", "12.236.174.155.090.900", "252.027.904.121.210", "11.136.991.054.110.100" ), Account_Desc = c("AGUAYO CARLA", "LEIVA

Pass argument to data.table aggregation function

早过忘川 提交于 2020-01-22 21:22:49
问题 I have a function that calculates a weighted mean of a variable and groups it by time period using the data.table aggregation syntax. However, I want to provide the name of the weighting column programmatically. Is there a way to accomplish this while still using the traditional data.table syntax? The function wtmean1 below demonstrates the idea of what I want to do (but it produces an error). The function wtmean2 works and is inspired by the data.table FAQ, but it's more cumbersome to pass