data.table | 易学教程

Increase for loop efficiency using vectorization in R

阅读更多关于 Increase for loop efficiency using vectorization in R

问题 I am still a relatively new user to R but am attempting to vectorize my for loops in order to increase computational speed. Currently, I need to perform two subsets on the data.frame/ data.table df (only named for this post): 1) subset by group id and 2) subset by each time interval interval for that particular id . I setup a nested for loop because I need to perform a homogeneity of variance test between each subset of data and a control using the levene.test in the lawstat package. The test

Bootstrapping multiple columns in data.table in a scalable fashion R

阅读更多关于 Bootstrapping multiple columns in data.table in a scalable fashion R

问题 This is a follow up question to this one. In the original question the OP wanted to perform bootstrap on two columns x1 and x2 that are fixed: set.seed(1000) data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5)) stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2))]} data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1 However, I think this problem can be nicely extended to handle any number of columns by treating them as groups. For instance, lets

How to remove duplicated values in uneven columns of a data.table?

阅读更多关于 How to remove duplicated values in uneven columns of a data.table?

问题 I want to remove duplicated values in each coulmn of an uneven data.table. For instance, if the original data is (the real data table has many columns and rows): dt <- data.table(A = c("5p", "3p", "3p", "6y", NA), B = c("1c", "4r", "1c", NA, NA), C = c("4f", "5", "5", "5", "4m")) > dt A B C 1: 5p 1c 4f 2: 3p 4r 5 3: 3p 1c 5 4: 6y <NA> 5 5: <NA> <NA> 4m after removal of duplicated values in each column it should look like this: A B C 5p 1c 4f 3p 4r 5 NA NA NA 6y NA NA NA NA 4m I am trying a

Subset data.table based on value in column of type list

阅读更多关于 Subset data.table based on value in column of type list

问题 So I have this case currently of a data.table with one column of type list. This list can contain different values, NULL among other possible values. I tried to subset the data.table to keep only rows for which this column has the value NULL . Behold... my attempts below (for the example I named the column "ColofTypeList"): DT[is.null(ColofTypeList)] It returns me an Empty data.table . Then I tried: DT[ColofTypeList == NULL] It returns the following error (I expected an error): Error in

Conditional merge, based an event happening between two panel observations

阅读更多关于 Conditional merge, based an event happening between two panel observations

问题 I have a panel dataset: panel and a dataset with a list of events: Events . For the panel dataset, an equal panelID shows that two observations belong together. panelID = c(1:50) year= c(2001:2010) country = c("NLD", "GRC", "GBR") n <- 2 library(data.table) set.seed(123) Panel <- data.table(panelID = rep(sample(panelID), each = n), country = rep(sample(country, length(panelID), replace = T), each = n), year = c(replicate(length(panelID), sample(year, n))), some_NA = sample(0:5, 6), some_NA

Conditional merge, based an event happening between two panel observations

阅读更多关于 Conditional merge, based an event happening between two panel observations

How to automatically interpolate values for one data frame based on another lookup table/data frame?

阅读更多关于 How to automatically interpolate values for one data frame based on another lookup table/data frame?

问题 I have one data frame and one look up table. What I want is to compare df_dat$value with df_lookup$threshold . If the value falls into threshold range, then create a new column transfer in df_dat so that its values are linearly interpolated from the transfer column in df_lookup library(dplyr) df_lookup <- tribble( ~threshold, ~transfer, 0, 0, 100, 15, 200, 35 ) df_lookup #> # A tibble: 3 x 2 #> threshold transfer #> <dbl> <dbl> #> 1 0 0 #> 2 100 15 #> 3 200 35 df_dat <- tribble( ~date, ~value

R script: While using shift function in data.table - errror: (list) object cannot be coerced to type 'double'

阅读更多关于 R script: While using shift function in data.table - errror: (list) object cannot be coerced to type 'double'

问题 I have data.table set.seed(1) dat <- data.table(Shift = c(c(0,0,0,1,2,1,1)), Value = rnorm(7),I.Value = rnorm(7)) dat Shift Value I.Value 0 -0.6264538 0.7383247 0 0.1836433 0.5757814 0 -0.8356286 -0.3053884 1 1.5952808 1.5117812 2 0.3295078 0.3898432 1 -0.8204684 -0.6212406 1 0.4874291 -2.2146999 I want the new column to be shift(Value,Shift,fill=0). Hence the result should be- Shift Value I.Value new.value new.I.value 0 -0.6264538 0.7383247 -0.6264538 0.7383247 0 0.1836433 0.5757814 0

best way to manipulate strings in big data.table

阅读更多关于 best way to manipulate strings in big data.table

问题 I have a 67MM row data.table with people names and surname separated by spaces. I just need to create a new column for each word. Here is an small subset of the data: n <- structure(list(Subscription_Id = c("13.855.231.846.091.000", "11.156.048.529.090.800", "24.940.584.090.830", "242.753.039.111.124", "27.843.782.090.830", "13.773.513.145.090.800", "25.691.374.090.830", "12.236.174.155.090.900", "252.027.904.121.210", "11.136.991.054.110.100" ), Account_Desc = c("AGUAYO CARLA", "LEIVA

Pass argument to data.table aggregation function

阅读更多关于 Pass argument to data.table aggregation function

问题 I have a function that calculates a weighted mean of a variable and groups it by time period using the data.table aggregation syntax. However, I want to provide the name of the weighting column programmatically. Is there a way to accomplish this while still using the traditional data.table syntax? The function wtmean1 below demonstrates the idea of what I want to do (but it produces an error). The function wtmean2 works and is inspired by the data.table FAQ, but it's more cumbersome to pass