data.table

When using `data.table`'s DT[ i , j, by], is it possible to set the column types before hand?

给你一囗甜甜゛ 提交于 2021-01-27 13:29:01
问题 I'm trying to calculating the correlation between two variables for multiple different groups (e.g. DT[, cor.test(var1, var2), group] ). This works great whenever I use cor.test(var1, var2, method = 'pearson') but throws an error when I use cor.test(var1, var2, method = 'spearman') . library(data.table) DT <- as.data.table(iris) # works perfectly DT[,cor.test(Sepal.Length,Sepal.Width, method = 'pearson'), Species] # Species statistic parameter p.value estimate null.value # 1: setosa 7.680738

Why do I get a “ Invalid .internal.selfref detected” warning (but no output) even if I am not using list(),key<-, names<-, or attr<-?

ε祈祈猫儿з 提交于 2021-01-27 13:08:53
问题 In a new user created function I like to do some data.table transformations, especially I like to create a new column with the ':=' command. Assume I like to make a new column called Sex that capitalizes the first letter of the column df$sex in my example data.frame df. The output of my prepare function should be a data.table with the same name as before but with the additional "capitalised" column. I try several ways to loop over the data.table. However I always get the following warning

Conditionally Select Rows within a Group with Data.Table

╄→尐↘猪︶ㄣ 提交于 2021-01-27 12:23:21
问题 I am looking for solutions using data.table ― I have a data.table with the following columns: data <- data.frame(GROUP=c(3,3,4,4,5,6), YEAR=c(1979,1985,1999,2011,2012,1994), NAME=c("S","A","J","L","G","A")) data <- as.data.table(data) Data.table: GROUP YEAR NAME 3 1979 Smith 3 1985 Anderson 4 1999 James 4 2011 Liam 5 2012 George 6 1994 Adams For each group we want to select one row using the following rule: If there is a year > 2000, select the row with minimum year above 2000. If there not a

R data.table fwrite to fread space delimiter and empties

为君一笑 提交于 2021-01-27 11:21:06
问题 I am having problems using fread with " " as delimiter and interspersed blank values. For example, this: dt <- data.table(1:5,1:5,1:5) #make a simple table dt[3,"V2" := NA] #add a blank in the middle to illustrate the problem fwrite(dt, file = "dt.csv", sep = " ") #save to file dt <- fread("dt.csv", sep = " ") #try to retrieve The fread fails with: "Stopped early on line 4. Expected 3 fields but found 2." The problem seems to be that with the NA value in the middle column, fwrite gives value

R - indices of matching values of two data.tables

孤人 提交于 2021-01-27 10:40:29
问题 This is my first post at StackOverflow. I am relatively a newbie in programming and trying to work with the data.table in R, for its reputation in speed. I have a very large data.table, named "Actions", with 5 columns and potentially several million rows. The column names are k1, k2, i, l1 and l2. I have another data.table, with the unique values of Actions in columns k1 and k2, named "States". For every row in Actions, I would like to find the unique index for columns 4 and 5, matching with

Dealing with Byte Order Mark (BOM) in R [duplicate]

早过忘川 提交于 2021-01-27 07:42:28
问题 This question already has answers here : Read a UTF-8 text file with BOM (2 answers) Closed 4 years ago . Sometimes a Byte Order Mark (BOM) is present at the beginning of a .CSV file. The symbol is not visible when you open the file using Notepad or Excel, however, When you read the file in R using various methods, you will different symbols in the name of first column. here is an example A sample csv file with BOM in the beginning. ID,title,clean_title,clean_title_id 1,0 - 0,,0 2,"""0 - 1

Making function with data.table in R

只愿长相守 提交于 2021-01-27 07:12:43
问题 I am learning to write function with the library data.table . After experiments, i used get() to convert a variable to an object. Would like to know if there is more ways to realize it? library(data.table) DT <- data.table( V1=rep(letters[1:3],5), V2=c(2:16) ) Test1 <- DT[,.((V2-sd(V2))/(max(V2)-min(V2))), by=.(V1)] # for comparision Norma <- function(dataset, Vari, group_by){ dataset[, .((get(Vari)-sd(get(Vari)))/(max(get(Vari))-min(get(Vari)))), by=.(get(group_by)) ] } Test2 <- Norma(DT,"V2

How to group rows in a range and consider a 3rd column?

我们两清 提交于 2021-01-27 06:22:30
问题 I have a genetic dataset where I want to group genetic variants/rows that are physically close together in the genome. I want to group genes that are within ranges from certain spots in the genome per chromosome ( chrom ). My 'spots' dataset is of positions that variants/rows need to be within a range of and looks like: chrom low high 1 500 1700 1 19500 20600 5 400 1500 My low and high columns are the ranges that I want to see if any rows in my next dataset fall into, with also accounting

How to group rows in a range and consider a 3rd column?

前提是你 提交于 2021-01-27 06:21:10
问题 I have a genetic dataset where I want to group genetic variants/rows that are physically close together in the genome. I want to group genes that are within ranges from certain spots in the genome per chromosome ( chrom ). My 'spots' dataset is of positions that variants/rows need to be within a range of and looks like: chrom low high 1 500 1700 1 19500 20600 5 400 1500 My low and high columns are the ranges that I want to see if any rows in my next dataset fall into, with also accounting

Memory profiling with data.table

╄→尐↘猪︶ㄣ 提交于 2021-01-27 04:49:02
问题 What is the correct way to profile memory in R code that contains calls to data.table functions? Let's say I want to determine the maximum memory usage during an expression. This reference indicates that Rprofmem may not be the right choice: https://cran.r-project.org/web/packages/profmem/vignettes/profmem.html All memory allocations that are done via the native allocVector3() part of R's native API are logged, which means that nearly all memory allocations are logged. Any objects allocated