data.table | 易学教程

Speed up import of fixed width format table in R

阅读更多关于 Speed up import of fixed width format table in R

问题 I'm importing table from a fixed width format .txt file in R. This table has about 100 observations and 200000 lines (a few lines below). 11111 2008 7 31 21 2008 8 1 21 3 4 6 18 4 7 0 12 0 0 0 0 0 1 0 0 0 0 0 0 0 5 0 0 7 5 0 1 0 2 0 0 0 0 0 0 2 0 0 0.0 5 14.9 0 14.9 0 14.0 0 16.5 0 14.9 0 15.6 0 15.3 0 0 15.6 0 15.6 0 17.6 0 16.1 0 17.10 0 1 97 0 0.60 0 1 15.1 0 986.6 0 1002.9 0 7 0 0.2 0 11111 2008 8 1 0 2008 8 1 0 4 7 6 18 4 98 0 1 9 0 0 0 2 0 1 0 0 0 0 0 0 0 5 0 0 7 0 0 0 1 0 2 0 260 0 1 0

R data.table: Count Occurrences Prior to Current Measurement

阅读更多关于 R data.table: Count Occurrences Prior to Current Measurement

问题 I've a set of measurements that are taken over a period of days. The number of measurements is typically 4. The range of numbers that can be captured in any measurement is 1-5 (in real life, given the test set, the range could be as high as 100 or as low as 20). I want to count, per day, how many of each value has happened prior to the current day. Let me explain with some sample data: # test data creation d1 = list(as.Date("2013-5-4"), 4,2) d2 = list(as.Date("2013-5-9"), 2,5) d3 = list(as

“no such index at level 1” error for a specific scenario when trying to use data.table programmatically

阅读更多关于 “no such index at level 1” error for a specific scenario when trying to use data.table programmatically

问题 The Problem I wrote a function to use data.table programmatically. The function is as follows transformVariables4 <- function(df_1n_data, c_1n_variablesToTransform, c_1n_newVariableNames, f_01_functionToTransform, ...) { for (i in 1:length(c_1n_variablesToTransform)) { df_1n_data[, c(c_1n_newVariableNames[i]) := list(forceAndCall(n = 1, FUN = f_01_functionToTransform, df_1n_data[[c_1n_variablesToTransform[i]]], ...))] } return(df_1n_data) } The function works fine for this scenario df <- data

“no such index at level 1” error for a specific scenario when trying to use data.table programmatically

阅读更多关于 “no such index at level 1” error for a specific scenario when trying to use data.table programmatically

Why do I need to wrap `get` in a dummy function within a J `lapply` call?

阅读更多关于 Why do I need to wrap `get` in a dummy function within a J `lapply` call?

问题 I'm looking to process columns by criteria like class or common pattern matching via grep . My first attempt did not work: require(data.table) test.table <- data.table(a=1:10,ab=1:10,b=101:110) ##this does not work and hangs on my machine test.table[,lapply(names(test.table)[grep("a",names(test.table))], get)] Ricardo Saporta notes in an answer that you can use this construct, but you have to wrap get in a dummy function: ##this works test.table[,lapply(names(test.table)[grep("a",names(test

Get value of last non-NA row per column in data.table

阅读更多关于 Get value of last non-NA row per column in data.table

问题 I have a datatable where each column represents a time series, and I want to grab the last NA value per time series in a column-ordered manner. In my particular use case my data looks like this: a b c 1 2 5 1 -17 9 NA 11 4 NA 57 NA 63 NA NA So out of this I would like to extract: a b c 63 57 4 How can I accomplish this? So far I only see answers addressing the converse situation of extracting the last non-NA per row rather than per column. 回答1: If the dataset is data.table , loop through the

R data.table roll=“nearest” not actually nearest

阅读更多关于 R data.table roll=“nearest” not actually nearest

问题 Given the following data.tables I'm surprised to see the 5.9 index matching with 5 rather than 6. I don't quite understand what's going on. dat <- data.table(index = c(4.3, 5.9, 1.2), datval = runif(3)+10, datstuff="test") reference <- data.table(index = 1:10, refjunk = "junk", refval = runif(10)) dat[, dat_index := index] reference[dat, roll="nearest", on="index"] I would expect to see 3 rows with the index==6 row in reference being matched with the index==5.9 row in dat, at least for my

R data.table roll=“nearest” not actually nearest

阅读更多关于 R data.table roll=“nearest” not actually nearest

Select row from data.table with min value

阅读更多关于 Select row from data.table with min value

问题 I have a data.table and I need to compute some new value on it and select row with min value. tb <- data.table(g_id=c(1, 1, 1, 2, 2, 2, 3), item_no=c(24,25,26,27,28,29,30), time_no=c(100, 110, 120, 130, 140, 160, 160), key="g_id") # g_id item_no time_no # 1: 1 24 100 # 2: 1 25 110 # 3: 1 26 120 # 4: 2 27 130 # 5: 2 28 140 # 6: 2 29 160 # 7: 3 30 160 ts <- 118 gId <- 2 tb[.(gId), list(item_no, tdiff={z=abs(time_no - ts)})] # g_id item_no tdiff # 1: 2 27 12 # 2: 2 28 22 # 3: 2 29 42 And now I

How to merge lists of vectors based on one vector belonging to another vector?

阅读更多关于 How to merge lists of vectors based on one vector belonging to another vector?

问题 In R, I have two data frames that contain list columns d1 <- data.table( group_id1=1:4 ) d1$Cat_grouped <- list(letters[1:2],letters[3:2],letters[3:6],letters[11:12] ) And d_grouped <- data.table( group_id2=1:4 ) d_grouped$Cat_grouped <- list(letters[1:5],letters[6:10],letters[1:2],letters[1] ) I would like to merge these two data.tables based on the vectors in d1$Cat_grouped being contained in the vectors in d_grouped$Cat_grouped To be more precise, there could be two matching criteria: a)