data.table

Iterate through data tables

匆匆过客 提交于 2020-01-07 03:45:09
问题 I have 3 tables as tbl.1 <- data.table("A" = runif(5), "B" = runif(5)) tbl.2 <- data.table("A" = runif(5), "B" = runif(5)) tbl.3 <- data.table("A" = runif(5), "B" = runif(5)) I would like to iterate through the tables with a loop such as for (i in 1:3) { # Open tbl.i # Do something } How can this be done? I can put the tables on a list an iterate through the list which works OK. However, I am trying to keep the tables as unique objects for various reasons. Thanks. 回答1: If you don't want to

Aggregate column intervals into new columns in data.table

蓝咒 提交于 2020-01-06 23:48:34
问题 I would like to aggregate a data.table based on intervals of a column ( time ). The idea here is that each interval should be a separate column with a different name in the output. I've seen a similar question in SO but I couldn't get my head around the problem. help? reproducible example library(data.table) # sample data set.seed(1L) dt <- data.table( id= sample(LETTERS,50,replace=TRUE), time= sample(60,50,replace=TRUE), points= sample(1000,50,replace=TRUE)) # simple summary by `id` dt[, .

Manipulate char vectors inside a data.table object in R

空扰寡人 提交于 2020-01-06 19:34:11
问题 I'm a bit new still to using data.table and understanding all its subtleties. I've looked in the doc and in other examples in SO but couldn't find what I want, so please help ! I have a data.table which is basically a char vector (each entry being a sentence) DT=c("I love you","she loves me") DT=as.data.table(DT) colnames(DT) <- "text" setkey(DT,text) # > DT # text # 1: I love you # 2: she loves me What I'd like to do, is to be able to perform some basic string operations inside the DT object

R data.table - categorical values in one column to binary values in multiple columns [duplicate]

久未见 提交于 2020-01-06 16:20:42
问题 This question already has answers here : How to programmatically create binary columns based on a categorical variable in data.table? (3 answers) Closed 3 years ago . I have one data.table with 2 columns ID and X, where X contains categorical values (a, b, c) ID X 1 a 2 c 3 b 4 c I would like to transform X into 3 binary columns where the column names are a, b and c ID a b c 1 1 0 0 2 0 0 1 3 0 1 0 4 0 0 1 What will be a good way to do this? Thank you! 回答1: Using dcast from data.table , dcast

Exponential curve fitting with nls using data.table groups

不羁的心 提交于 2020-01-06 07:14:24
问题 I'd like to fit exponential curves to groups 1 & 2 in the data table shown below and obtain a new column containing the residual standard error corresponding to each group. The exponential curve should follow y=a*exp(b*x)+c ## Example data table DT <- data.table( x = c(1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8), y = c(15.4,16,16.4,17.7,20,23,27,35,25.4,26,26.4,27.7,30,33,37,45), groups = c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2) However, I only know how to fit nls curves and obtain the residual standard error

Linear regression loop with data.table; “Error in data.table column or argument (nr) is NULL”

孤街浪徒 提交于 2020-01-06 05:27:14
问题 As my dataset is cumbersomely large, I would like to automate some procedures. I found this link, which proposes a linear regression loop, which for the dataset mtcars is as follows: data.table(mtcars)[, .(MyFits = lapply(.SD, function(x) if(is.numeric(x)) summary(lm(mpg ~ x)))), .SDcols = -1] I have tried to apply this onto my own dataset with limited succes. I do get the output but there is a problem. The result for some of the Fits is NULL, so when I try to do the suggested operation Fits[

Changing value when multiple rows/columns combined do not meet a requirement

安稳与你 提交于 2020-01-06 04:08:05
问题 Relatively new to R, working on a project with millions of rows so I made this example: I've got a matrix with three different rows of data. If the combination of [,1][,2][Farm] has less then two observations in total, the [Farm] value of that row gets changed to q99999. This way they fall in the same group for later analysis. A <- matrix(c(1,1,2,3,4,5,5), ncol = 7) B <- matrix(c(T,T,F,T,F,T,T), ncol = 7) C <- matrix(c("Req","Req","Req","fd","as","f","bla"), ncol = 7) AB <- rbind.fill.matrix

Is there a way to efficiently count column values in A falling within ranges in B using data.table?

时光怂恿深爱的人放手 提交于 2020-01-05 19:26:20
问题 I have created some code to handle the following task: ref = read.table(header=TRUE, text=" user event 1441 120120102 1441 120120888 1443 120122122 1445 120124452 1445 120123525 1446 120123463", stringsAsFactors=FALSE) data = read.table(header=TRUE, text=" user event1 event2 1440 120123432 120156756 1441 120128523 120156545 1441 120123333 120146444 1441 120122344 120122355", stringsAsFactors=FALSE) What I have here is a function (credit to user Carlos Cinelli) that will allow me to go line by

Is there a way to efficiently count column values in A falling within ranges in B using data.table?

依然范特西╮ 提交于 2020-01-05 19:26:11
问题 I have created some code to handle the following task: ref = read.table(header=TRUE, text=" user event 1441 120120102 1441 120120888 1443 120122122 1445 120124452 1445 120123525 1446 120123463", stringsAsFactors=FALSE) data = read.table(header=TRUE, text=" user event1 event2 1440 120123432 120156756 1441 120128523 120156545 1441 120123333 120146444 1441 120122344 120122355", stringsAsFactors=FALSE) What I have here is a function (credit to user Carlos Cinelli) that will allow me to go line by

R / data.table() merge on named subset of another data.table

喜欢而已 提交于 2020-01-05 15:06:32
问题 I'm trying to put together several files and need to do a bunch of merges on column names that are created inside a loop. I can do this fine using data.frame() but am having issues using similar code with a data.table() : library(data.table) df1 <- data.frame(id = 1:20, col1 = runif(20)) df2 <- data.frame(id = 1:20, col1 = runif(20)) newColNum <- 5 newColName <- paste('col',newColNum ,sep='') df1[,newColName] <- runif(20) df2 <- merge(df2, df1[,c('id',newColName)], by = 'id', all.x = T) #