data.table

Speed up import of fixed width format table in R

浪尽此生 提交于 2020-01-15 04:45:07
问题 I'm importing table from a fixed width format .txt file in R. This table has about 100 observations and 200000 lines (a few lines below). 11111 2008 7 31 21 2008 8 1 21 3 4 6 18 4 7 0 12 0 0 0 0 0 1 0 0 0 0 0 0 0 5 0 0 7 5 0 1 0 2 0 0 0 0 0 0 2 0 0 0.0 5 14.9 0 14.9 0 14.0 0 16.5 0 14.9 0 15.6 0 15.3 0 0 15.6 0 15.6 0 17.6 0 16.1 0 17.10 0 1 97 0 0.60 0 1 15.1 0 986.6 0 1002.9 0 7 0 0.2 0 11111 2008 8 1 0 2008 8 1 0 4 7 6 18 4 98 0 1 9 0 0 0 2 0 1 0 0 0 0 0 0 0 5 0 0 7 0 0 0 1 0 2 0 260 0 1 0

R data.table: Count Occurrences Prior to Current Measurement

泪湿孤枕 提交于 2020-01-15 03:27:27
问题 I've a set of measurements that are taken over a period of days. The number of measurements is typically 4. The range of numbers that can be captured in any measurement is 1-5 (in real life, given the test set, the range could be as high as 100 or as low as 20). I want to count, per day, how many of each value has happened prior to the current day. Let me explain with some sample data: # test data creation d1 = list(as.Date("2013-5-4"), 4,2) d2 = list(as.Date("2013-5-9"), 2,5) d3 = list(as

“no such index at level 1” error for a specific scenario when trying to use data.table programmatically

蹲街弑〆低调 提交于 2020-01-14 16:34:55
问题 The Problem I wrote a function to use data.table programmatically. The function is as follows transformVariables4 <- function(df_1n_data, c_1n_variablesToTransform, c_1n_newVariableNames, f_01_functionToTransform, ...) { for (i in 1:length(c_1n_variablesToTransform)) { df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))] } return(df_1n_data) } The function works fine for this scenario df <- data

“no such index at level 1” error for a specific scenario when trying to use data.table programmatically

我的梦境 提交于 2020-01-14 16:32:58
问题 The Problem I wrote a function to use data.table programmatically. The function is as follows transformVariables4 <- function(df_1n_data, c_1n_variablesToTransform, c_1n_newVariableNames, f_01_functionToTransform, ...) { for (i in 1:length(c_1n_variablesToTransform)) { df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))] } return(df_1n_data) } The function works fine for this scenario df <- data

Why do I need to wrap `get` in a dummy function within a J `lapply` call?

三世轮回 提交于 2020-01-14 14:30:30
问题 I'm looking to process columns by criteria like class or common pattern matching via grep . My first attempt did not work: require(data.table) test.table <- data.table(a=1:10,ab=1:10,b=101:110) ##this does not work and hangs on my machine test.table[,lapply(names(test.table)[grep("a",names(test.table))], get)] Ricardo Saporta notes in an answer that you can use this construct, but you have to wrap get in a dummy function: ##this works test.table[,lapply(names(test.table)[grep("a",names(test

Get value of last non-NA row per column in data.table

孤街浪徒 提交于 2020-01-14 14:21:07
问题 I have a datatable where each column represents a time series, and I want to grab the last NA value per time series in a column-ordered manner. In my particular use case my data looks like this: a b c 1 2 5 1 -17 9 NA 11 4 NA 57 NA 63 NA NA So out of this I would like to extract: a b c 63 57 4 How can I accomplish this? So far I only see answers addressing the converse situation of extracting the last non-NA per row rather than per column. 回答1: If the dataset is data.table , loop through the

R data.table roll=“nearest” not actually nearest

余生长醉 提交于 2020-01-14 09:56:27
问题 Given the following data.tables I'm surprised to see the 5.9 index matching with 5 rather than 6. I don't quite understand what's going on. dat <- data.table(index = c(4.3, 5.9, 1.2), datval = runif(3)+10, datstuff="test") reference <- data.table(index = 1:10, refjunk = "junk", refval = runif(10)) dat[, dat_index := index] reference[dat, roll="nearest", on="index"] I would expect to see 3 rows with the index==6 row in reference being matched with the index==5.9 row in dat, at least for my

R data.table roll=“nearest” not actually nearest

Deadly 提交于 2020-01-14 09:56:03
问题 Given the following data.tables I'm surprised to see the 5.9 index matching with 5 rather than 6. I don't quite understand what's going on. dat <- data.table(index = c(4.3, 5.9, 1.2), datval = runif(3)+10, datstuff="test") reference <- data.table(index = 1:10, refjunk = "junk", refval = runif(10)) dat[, dat_index := index] reference[dat, roll="nearest", on="index"] I would expect to see 3 rows with the index==6 row in reference being matched with the index==5.9 row in dat, at least for my

Select row from data.table with min value

让人想犯罪 __ 提交于 2020-01-14 09:44:15
问题 I have a data.table and I need to compute some new value on it and select row with min value. tb <- data.table(g_id=c(1, 1, 1, 2, 2, 2, 3), item_no=c(24,25,26,27,28,29,30), time_no=c(100, 110, 120, 130, 140, 160, 160), key="g_id") # g_id item_no time_no # 1: 1 24 100 # 2: 1 25 110 # 3: 1 26 120 # 4: 2 27 130 # 5: 2 28 140 # 6: 2 29 160 # 7: 3 30 160 ts <- 118 gId <- 2 tb[.(gId), list(item_no, tdiff={z=abs(time_no - ts)})] # g_id item_no tdiff # 1: 2 27 12 # 2: 2 28 22 # 3: 2 29 42 And now I

How to merge lists of vectors based on one vector belonging to another vector?

两盒软妹~` 提交于 2020-01-14 09:38:08
问题 In R, I have two data frames that contain list columns d1 <- data.table( group_id1=1:4 ) d1$Cat_grouped <- list(letters[1:2],letters[3:2],letters[3:6],letters[11:12] ) And d_grouped <- data.table( group_id2=1:4 ) d_grouped$Cat_grouped <- list(letters[1:5],letters[6:10],letters[1:2],letters[1] ) I would like to merge these two data.tables based on the vectors in d1$Cat_grouped being contained in the vectors in d_grouped$Cat_grouped To be more precise, there could be two matching criteria: a)