data.table

Fit model by group using Data.Table package

喜你入骨 提交于 2021-02-06 22:00:18
问题 How can I fit multiple models by group using data.table syntax? I want my output to be a data.frame with columns for each "by group" and one column for each model fit. Currently I am able to do this using the dplyr package, but can't do this in data.table. # example data frame df <- data.table( id = sample(c("id01", "id02", "id03"), N, TRUE), v1 = sample(5, N, TRUE), v2 = sample(round(runif(100, max = 100), 4), N, TRUE) ) # equivalent code in dplyr group_by(df, id) %>% do( model1= lm(v1 ~v2,

Fit model by group using Data.Table package

↘锁芯ラ 提交于 2021-02-06 21:56:17
问题 How can I fit multiple models by group using data.table syntax? I want my output to be a data.frame with columns for each "by group" and one column for each model fit. Currently I am able to do this using the dplyr package, but can't do this in data.table. # example data frame df <- data.table( id = sample(c("id01", "id02", "id03"), N, TRUE), v1 = sample(5, N, TRUE), v2 = sample(round(runif(100, max = 100), 4), N, TRUE) ) # equivalent code in dplyr group_by(df, id) %>% do( model1= lm(v1 ~v2,

Conditional (inequality) join in data.table

不想你离开。 提交于 2021-02-06 10:47:45
问题 I'm just trying to figure out how to do a conditional join on two data.tables. I've written a sqldf conditional join to give me the circuits whose start or finish times are within the other's start/finish times. sqldf("select dt2.start, dt2.finish, dt2.counts, dt1.id, dt1.circuit from dt2 left join dt1 on ( (dt2.start >= dt1.start and dt2.start < dt1.finish) or (dt2.finish >= dt1.start and dt2.finish < dt1.finish) )") This gives me the correct result, but it's too slow for my large-ish data

R data.table Multiple Conditions Join

早过忘川 提交于 2021-02-06 09:04:11
问题 I’ve devised a solution to lookup values from multiple columns of two separate data tables and add a new column based calculations of their values (multiple conditional comparisons). Code below. It involves using a data.table and join while calculating values from both tables, however, the tables aren’t joined on the columns I’m comparing, and therefore I suspect I may not be getting the speed advantages inherent to data.tables that I’ve read so much about and am excited about tapping into.

R data.table Multiple Conditions Join

拜拜、爱过 提交于 2021-02-06 09:03:51
问题 I’ve devised a solution to lookup values from multiple columns of two separate data tables and add a new column based calculations of their values (multiple conditional comparisons). Code below. It involves using a data.table and join while calculating values from both tables, however, the tables aren’t joined on the columns I’m comparing, and therefore I suspect I may not be getting the speed advantages inherent to data.tables that I’ve read so much about and am excited about tapping into.

How to cut a vector or column into intervals in R [duplicate]

放肆的年华 提交于 2021-02-05 11:46:39
问题 This question already has answers here : Convert continuous numeric values to discrete categories defined by intervals (2 answers) Closed 1 year ago . I have the following columns in a dataframe which difference between each row is 0.012 s : Time 0 0.012 0.024 0.036 0.048 0.060 0.072 0.084 0.096 0.108 I want to come up with intervals starting from beginning increasing by 0.030, so intervals or time window of every 0.03 later to be used in group by. 回答1: You can try findInterval like

R data.table - sample by group with different sampling proportion

那年仲夏 提交于 2021-02-05 11:14:03
问题 I would like to efficiently make a random sample by group from a data.table , but it should be possible to sample a different proportion for each group. If I wanted to sample fraction sampling_fraction from each group, i could get inspired by this question and related answer to do something like: DT = data.table(a = sample(1:2), b = sample(1:1000,20)) group_sampler <- function(data, group_col, sample_fraction){ # this function samples sample_fraction <0,1> from each group in the data.table #

R data.table - sample by group with different sampling proportion

旧巷老猫 提交于 2021-02-05 11:13:02
问题 I would like to efficiently make a random sample by group from a data.table , but it should be possible to sample a different proportion for each group. If I wanted to sample fraction sampling_fraction from each group, i could get inspired by this question and related answer to do something like: DT = data.table(a = sample(1:2), b = sample(1:1000,20)) group_sampler <- function(data, group_col, sample_fraction){ # this function samples sample_fraction <0,1> from each group in the data.table #

Apply a rolling function by group in r (zoo, data.table)

∥☆過路亽.° 提交于 2021-02-05 11:09:59
问题 I am having trouble doing something fairly simple: apply a rolling function (standard deviation) by group in a data.table. My problem is that when I use a data.table with rollapply by some column, data.table recycles the observations as noted in the warning message below. I would like to get NAs for the observations that are outside of the window instead of recycling the standard deviations. This is my approach so far using iris, and a rolling window of size 2, aligned to the right: library

Apply a rolling function by group in r (zoo, data.table)

六月ゝ 毕业季﹏ 提交于 2021-02-05 11:07:19
问题 I am having trouble doing something fairly simple: apply a rolling function (standard deviation) by group in a data.table. My problem is that when I use a data.table with rollapply by some column, data.table recycles the observations as noted in the warning message below. I would like to get NAs for the observations that are outside of the window instead of recycling the standard deviations. This is my approach so far using iris, and a rolling window of size 2, aligned to the right: library