data.table

split character columns and get names of field in string

荒凉一梦 提交于 2021-02-02 09:17:06
问题 I need to split a column that contains information into several columns. I'd use tstrsplit but the same kind of information is not in the same order among the rows and I need to extract the name of the new column within the variable. Important to know: there can be many pieces of information (fields to become new variables) and I don't know all of them, so I don't want a "field by field" solution. Below is an example of what I have: library(data.table) myDT <- structure(list(chr = c("chr1",

split character columns and get names of field in string

你。 提交于 2021-02-02 09:16:09
问题 I need to split a column that contains information into several columns. I'd use tstrsplit but the same kind of information is not in the same order among the rows and I need to extract the name of the new column within the variable. Important to know: there can be many pieces of information (fields to become new variables) and I don't know all of them, so I don't want a "field by field" solution. Below is an example of what I have: library(data.table) myDT <- structure(list(chr = c("chr1",

split character columns and get names of field in string

只谈情不闲聊 提交于 2021-02-02 09:14:34
问题 I need to split a column that contains information into several columns. I'd use tstrsplit but the same kind of information is not in the same order among the rows and I need to extract the name of the new column within the variable. Important to know: there can be many pieces of information (fields to become new variables) and I don't know all of them, so I don't want a "field by field" solution. Below is an example of what I have: library(data.table) myDT <- structure(list(chr = c("chr1",

Read csv file with selected rows using data.table's fread

喜你入骨 提交于 2021-01-29 17:36:01
问题 I was going through some earlier post- Quickest way to read a subset of rows of a CSV One way to select subset of data is write.csv(iris,"iris.csv") fread("shuf -n 5 iris.csv") However I was wondering if I can pass some SQL query instead of top 5 rows e.g. only import those rows that have V6 = versicolor Is there any way to do this using fread function? 回答1: This worked for me in windows (unix alternative is grep ) write.csv(iris,"iris.csv") fread(cmd = paste('findstr', 'versicolor', 'iris

Aggregate results by date intervals in R

依然范特西╮ 提交于 2021-01-29 08:27:58
问题 I'm using R and I have my data on data.tables objects. My data is of the format ID, Date1, Date2, Row For each ID I can have more than one entry, and the two dates define a time interval. I want to be able to aggregate all the entries by id and overlapping time intervals. I do know how to do it with for loops and such, but I wonder if there is a better way. Example: data = data.table( id = c(1,1,1,2,2,3,3), Row = c(1,2,3,4,5,6,7), Date1 = c("2018-01-01", "2018-01-05", "2018-01-21", "2018-01

How to iterate through all combinations of columns and apply function by group in R?

喜夏-厌秋 提交于 2021-01-29 08:02:55
问题 I have the following data.table named dt set.seed(1) dt <- data.table(expand.grid(c("a","b"),1:2,1:2,c("M","N","O","P","Q"))) dt$perf <- rnorm(nrow(dt),0,.01) colnames(dt) <- c("ticker","par1","par2","row_names","perf") My goal is to iterate through all combinations of par1 and par2 by row_names and pick the one that maximizes cumprod(mean(perf)+1)-1 . Let's look at the data so this makes more sense visually. dt[order(row_names,ticker,par1,par2)] ticker par1 par2 row_names perf 1: a 1 1 M 0

Add ordered ID for each group by date

早过忘川 提交于 2021-01-29 07:56:47
问题 I want to add an ordered ID (by date) to each group in a data frame. I can do this using dplyr (R - add column that counts sequentially within groups but repeats for duplicates): # Example data date <- rep(c("2016-10-06 11:56:00","2016-10-05 11:56:00","2016-10-05 11:56:00","2016-10-07 11:56:00"),2) date <- as.POSIXct(date) group <- c(rep("A",4), rep("B",4)) df <- data.frame(group, date) # dplyr - dense_rank df2 <- df %>% group_by(group) %>% mutate(m.test=dense_rank(date)) group date m.test

Find nearest preceding and following dates between data frames

倾然丶 夕夏残阳落幕 提交于 2021-01-29 03:06:36
问题 I have the following two data frames: df1 <- data.frame(ID = c("A","A","B","B","C","D","D","D","E"), Date = as.POSIXct(c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00","2018-04-17 14:21:00","2018-04-18 09:56:00","2018-05-02 07:49:00"))) df2 <- data.frame(ID = c("A","A","A","B","C","D","D","D","D","D","E"), Date = as.POSIXct(c("2018-04-10 07:11:00","2018-04-11 18:59:00","2018-04-12 12:37:00","2018-04-15 01:43

Passing function argument to data.table i

巧了我就是萌 提交于 2021-01-28 19:10:20
问题 Say we have a data.table myDT <- data.table(id = c("a", "a", "b", "b", "c"), value = 1:5) setkey(myDT, id) I'd like to create a function fun <- function(id) { ... } such that if foo <- rep("b", 6) then fun(foo) # I want this to return 3 4 Basically, I want to pass id[[1]] from the execution environment to the i argument of myDT . I'm having a really hard time accessing the correct environment here and am looking for some help. Changing the name of the function argument is not an option. 回答1:

Update data.table based on multiple columns and conditions

做~自己de王妃 提交于 2021-01-28 17:35:13
问题 This is a follow-up-question from Efficient way to subset data.table based on value in any of selected columns. sample data I have got a data.table with 5 p-columns, indicating a type (type1 or type2 or NA ). I also have got 5 r-columns, indicating a score (1-10, or NA ). library(data.table) set.seed(123) v <- c( "type1", "type2", NA_character_ ) v2 <- c( 1:10, rep( NA_integer_, 10 ) ) DT <- data.table( id = 1:100, p1 = sample(v, 100, replace = TRUE ), p2 = sample(v, 100, replace = TRUE ), p3