data.table

Memory profiling with data.table

别等时光非礼了梦想. 提交于 2021-01-27 04:48:20
问题 What is the correct way to profile memory in R code that contains calls to data.table functions? Let's say I want to determine the maximum memory usage during an expression. This reference indicates that Rprofmem may not be the right choice: https://cran.r-project.org/web/packages/profmem/vignettes/profmem.html All memory allocations that are done via the native allocVector3() part of R's native API are logged, which means that nearly all memory allocations are logged. Any objects allocated

R - Rolling sum of two columns in data.table

故事扮演 提交于 2021-01-25 07:31:16
问题 I have a data.table as follows - dt = data.table( date = seq(as.Date("2015-12-01"), as.Date("2015-12-10"), by="days"), v1 = c(seq(1, 9), 20), v2 = c(5, rep(NA, 9)) ) dt date v1 v2 1: 2015-12-01 1 5 2: 2015-12-02 2 NA 3: 2015-12-03 3 NA 4: 2015-12-04 4 NA 5: 2015-12-05 5 NA 6: 2015-12-06 6 NA 7: 2015-12-07 7 NA 8: 2015-12-08 8 NA 9: 2015-12-09 9 NA 10: 2015-12-10 20 NA Question 1: I want to add the current row value of v1 with the previous row value of v2 so the output looks like the following

How to read when delimiter is space and missing values are blank?

夙愿已清 提交于 2021-01-24 06:54:56
问题 I have a space delimited file and some columns are blank, so we end up having multiple spaces, and fread fails with error. But read.table works fine. See example: library(data.table) # R version 3.4.2 (2017-09-28) # data.table_1.10.4-3 fread("A B C D 1 2 3 4 5 6 7", sep = " ", header = TRUE) Error in fread("A B C D\n1 2 3\n4 5 6 7") : Expected sep (' ') but new line, EOF (or other non printing character) ends field 2 when detecting types from point 0: 1 2 3 read.table(text ="A B C D 1 2 3 4 5

fast melt large 2d matrix to 3 column data.table

╄→гoц情女王★ 提交于 2021-01-24 05:41:14
问题 I have a large matrix num [1:62410, 1:48010] I want this in a long format data.table e.g. Var1 Var2 value 1: 1 1 -4227.786 2: 2 1 -4211.908 3: 3 1 -4197.034 4: 4 1 -4183.645 5: 5 1 -4171.692 6: 6 1 -4161.634 minimal example m = matrix(1:5, nrow = 1000, ncol = 1000) x = data.table(reshape2::melt(m)) ideally I'd want the columns names x, y and value at the same time. Previously I've been using data.table(melt(mymatrix)) . But judging by the warnings that reshape2::melt is deprecated, this is

fast melt large 2d matrix to 3 column data.table

一曲冷凌霜 提交于 2021-01-24 05:38:48
问题 I have a large matrix num [1:62410, 1:48010] I want this in a long format data.table e.g. Var1 Var2 value 1: 1 1 -4227.786 2: 2 1 -4211.908 3: 3 1 -4197.034 4: 4 1 -4183.645 5: 5 1 -4171.692 6: 6 1 -4161.634 minimal example m = matrix(1:5, nrow = 1000, ncol = 1000) x = data.table(reshape2::melt(m)) ideally I'd want the columns names x, y and value at the same time. Previously I've been using data.table(melt(mymatrix)) . But judging by the warnings that reshape2::melt is deprecated, this is

Easier way to calculate conditioned proportions with R data.table?

烈酒焚心 提交于 2021-01-23 08:20:48
问题 Let's say we have this toy example: library(data.table) temp <- data.table(first=c("A", "A","A", "A","B","C","C"), sec=c("X", "X","X", "Y","X", "Z","Z"), stringsAsFactors = T)) first sec A X A X A X A Y B X C Z C Z I would like to get a third column stating the proportion of times that combination happens among the occurrences of the first column. I got to do it with data.table in the following way: temp[,N1:=.N,by=.(first, sec)] temp[,N2:=.N,by=first] temp[, prop := N1/N2] temp[,c("N1","N2")

Easier way to calculate conditioned proportions with R data.table?

被刻印的时光 ゝ 提交于 2021-01-23 08:18:45
问题 Let's say we have this toy example: library(data.table) temp <- data.table(first=c("A", "A","A", "A","B","C","C"), sec=c("X", "X","X", "Y","X", "Z","Z"), stringsAsFactors = T)) first sec A X A X A X A Y B X C Z C Z I would like to get a third column stating the proportion of times that combination happens among the occurrences of the first column. I got to do it with data.table in the following way: temp[,N1:=.N,by=.(first, sec)] temp[,N2:=.N,by=first] temp[, prop := N1/N2] temp[,c("N1","N2")

R data.table: how to go from tibble to data.table to tibble back?

和自甴很熟 提交于 2021-01-22 05:24:10
问题 I use mainly tables in the tibble fromat from tidyverse , but for some steps, I use the data.table package. I want to see what is the best way of converting a data.table back to tibble ? I understand that data.table has some clever function setDT and setDF function, that convert from data.frame to data.table (and vice-versa) by reference, i.e. without making a copy. But what if I wanted to convert back to tibble ? Am I copying the data using as_tibble on the data.frame resulting from setDT()

Fuzzy merging in R - seeking help to improve my code

一个人想着一个人 提交于 2021-01-20 19:53:24
问题 Inspired by the experimental fuzzy_join function from the statar package I wrote a function myself which combines exact and fuzzy (by string distances) matching. The merging job I have to do is quite big (resulting into multiple string distance matrices with a little bit less than one billion cells) and I had the impression that the fuzzy_join function is not written very efficiently (with regard to memory usage) and the parallelization is implemented in a weird manner (the computation of the

Fuzzy merging in R - seeking help to improve my code

冷暖自知 提交于 2021-01-20 19:51:36
问题 Inspired by the experimental fuzzy_join function from the statar package I wrote a function myself which combines exact and fuzzy (by string distances) matching. The merging job I have to do is quite big (resulting into multiple string distance matrices with a little bit less than one billion cells) and I had the impression that the fuzzy_join function is not written very efficiently (with regard to memory usage) and the parallelization is implemented in a weird manner (the computation of the