data.table

Conditional keyed join/update _and_ update a flag column for matches

我的未来我决定 提交于 2020-01-03 19:01:10
问题 This is very similar to the question @DavidArenburg asked about conditional keyed joins, with an additional bugbear that I can't seem to suss out. Basically, in addition to a conditional join, I want to define a flag saying at which step of the matching process that the match occurred; my problem is that I can only get the flag to define for all values, not the matched values. Here's what I hope is a minimal working example: DT = data.table( name = c("Joe", "Joe", "Jim", "Carol", "Joe",

Find number of occurrences of modal value for a group using data.table [R]

怎甘沉沦 提交于 2020-01-03 16:52:52
问题 I've been using the excellent answer here to find the mode for groups with data table. However, I'd also like to find the number of occurrences of the modal value of x for each group of variable y. How can I do this? Edit: there is a faster way to find mode than in the answer linked above. I can't find the answer I got it from (please edit and link if you do), but it uses this function (and finds multiple modes if they exist): MultipleMode <- function(x) { ux <- unique(x) tab <- tabulate

Merging all column by reference in a data.table

蓝咒 提交于 2020-01-03 16:40:10
问题 I would like to merge two data.table together by reference without having to write down all variables I want to merge. Here is a simple example to understand my needs : set.seed(20170711) (a <- data.table(v_key=seq(1, 5), key="v_key")) # v_key #1: 1 #2: 2 #3: 3 #4: 4 #5: 5 a_backup <- copy(a) (b <- data.table(v_key=seq(1, 5), v1=runif(5), v2=runif(5), v3=runif(5), key="v_key")) # v_key v1 v2 v3 #1: 1 0.141804303 0.1311052 0.354798849 #2: 2 0.425955903 0.3635612 0.950234261 #3: 3 0.001070379 0

Merging all column by reference in a data.table

旧街凉风 提交于 2020-01-03 16:39:04
问题 I would like to merge two data.table together by reference without having to write down all variables I want to merge. Here is a simple example to understand my needs : set.seed(20170711) (a <- data.table(v_key=seq(1, 5), key="v_key")) # v_key #1: 1 #2: 2 #3: 3 #4: 4 #5: 5 a_backup <- copy(a) (b <- data.table(v_key=seq(1, 5), v1=runif(5), v2=runif(5), v3=runif(5), key="v_key")) # v_key v1 v2 v3 #1: 1 0.141804303 0.1311052 0.354798849 #2: 2 0.425955903 0.3635612 0.950234261 #3: 3 0.001070379 0

Using fread() to select rows and columns, the way read.csv.sql() does

懵懂的女人 提交于 2020-01-03 13:36:14
问题 I know fread is relatively new, but it really gives great performance improvements. What I want to know is, can you select rows and columns from the file that you are reading? A bit like what read.csv.sql does? I know using the select option of the fread one can select the columns to read, but how about reading only the rows which satisfy a certain criteria. For example, can something like below be implemented using fread ? read.csv.sql(file, sql = "select V2,V4,V7,V8,V9, V10 from file where

Lock or protect a data.table in R

前提是你 提交于 2020-01-03 10:56:30
问题 Are there one or more ways to lock or protect a data.table such that it can no longer be modified in-place? Say we have a data.table: dt <- data.table(id = 1, val="foo") dt # id val # 1: 1 foo Can I then modify dt to get the following behavior after? dt[, val:="bar"] # error or warning dt # id val # 1: 1 foo ## unmodified Context This came up because I author a small R package at work that uses data.table extensively. It has some data.tables in it (translation tables) which, if accidentally

R: Calculate moving maximum slope by week accounting for factors

亡梦爱人 提交于 2020-01-03 10:54:52
问题 I have a data.frame that includes heating degree day (HDD) below. structure(list(WinterID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

Subsetting data.table by not head(key(DT),m), using binary search not vector scan

拜拜、爱过 提交于 2020-01-03 08:55:29
问题 If I specify n columns as a key of a data.table , I'm aware that I can join to fewer columns than are defined in that key as long as I join to the head of key(DT) . For example, for n=2 : X = data.table(A=rep(1:5, each=2), B=rep(1:2, each=5), key=c('A','B')) X A B 1: 1 1 2: 1 1 3: 2 1 4: 2 1 5: 3 1 6: 3 2 7: 4 2 8: 4 2 9: 5 2 10: 5 2 X[J(3)] A B 1: 3 1 2: 3 2 There I only joined to the first column of the 2-column key of DT . I know I can join to both columns of the key like this : X[J(3,1)]

Sum multiple columns [duplicate]

↘锁芯ラ 提交于 2020-01-03 05:31:28
问题 This question already has an answer here : Summarizing multiple columns with data.table (1 answer) Closed 10 months ago . I am trying to write a function that will sum the column(s) in the data frame according to the values in the first two columns.For example I have a matrix M, Crs gr P_7 P_8 38 1 3 16 38 1 12 45 38 1 9 28 40 2 3 9 40 2 14 29 40 1 4 3 40 2 8 2 I want to sum the columns according to column1(crs) first and then column2(gr). Result will be, Crs gr P_7 P_8 38 1 24 89 40 2 25 40

R - pass fixed columns to lapply function in data.table

我怕爱的太早我们不能终老 提交于 2020-01-03 04:54:06
问题 I have a data.table with columns p1 , p2 , ... which contains percentages. I want to compute the quantiles for each columns given a reference variable val . Conceptually, this is like: quantile(val, p1, type = 4, na.rm = T) quantile(val, p2, type = 4, na.rm = T) ... My attempt at using data.table is as follows: fun <- function(x, y) quantile(y, x, type = 4, na.rm = T) dt[, c('q1', 'q2') := lapply(.SD, fun), .SDcols = c('p1', 'p2'), by = grp] where grp is some grouping variable However, I am