data.table | 易学教程

split character columns and get names of field in string

阅读更多关于 split character columns and get names of field in string

问题 I need to split a column that contains information into several columns. I'd use tstrsplit but the same kind of information is not in the same order among the rows and I need to extract the name of the new column within the variable. Important to know: there can be many pieces of information (fields to become new variables) and I don't know all of them, so I don't want a "field by field" solution. Below is an example of what I have: library(data.table) myDT <- structure(list(chr = c("chr1",

split character columns and get names of field in string

阅读更多关于 split character columns and get names of field in string

split character columns and get names of field in string

阅读更多关于 split character columns and get names of field in string

Read csv file with selected rows using data.table's fread

阅读更多关于 Read csv file with selected rows using data.table's fread

问题 I was going through some earlier post- Quickest way to read a subset of rows of a CSV One way to select subset of data is write.csv(iris,"iris.csv") fread("shuf -n 5 iris.csv") However I was wondering if I can pass some SQL query instead of top 5 rows e.g. only import those rows that have V6 = versicolor Is there any way to do this using fread function? 回答1: This worked for me in windows (unix alternative is grep ) write.csv(iris,"iris.csv") fread(cmd = paste('findstr', 'versicolor', 'iris

Aggregate results by date intervals in R

阅读更多关于 Aggregate results by date intervals in R

问题 I'm using R and I have my data on data.tables objects. My data is of the format ID, Date1, Date2, Row For each ID I can have more than one entry, and the two dates define a time interval. I want to be able to aggregate all the entries by id and overlapping time intervals. I do know how to do it with for loops and such, but I wonder if there is a better way. Example: data = data.table( id = c(1,1,1,2,2,3,3), Row = c(1,2,3,4,5,6,7), Date1 = c("2018-01-01", "2018-01-05", "2018-01-21", "2018-01

How to iterate through all combinations of columns and apply function by group in R?

阅读更多关于 How to iterate through all combinations of columns and apply function by group in R?

问题 I have the following data.table named dt set.seed(1) dt <- data.table(expand.grid(c("a","b"),1:2,1:2,c("M","N","O","P","Q"))) dt$perf <- rnorm(nrow(dt),0,.01) colnames(dt) <- c("ticker","par1","par2","row_names","perf") My goal is to iterate through all combinations of par1 and par2 by row_names and pick the one that maximizes cumprod(mean(perf)+1)-1 . Let's look at the data so this makes more sense visually. dt[order(row_names,ticker,par1,par2)] ticker par1 par2 row_names perf 1: a 1 1 M 0

Add ordered ID for each group by date

阅读更多关于 Add ordered ID for each group by date

问题 I want to add an ordered ID (by date) to each group in a data frame. I can do this using dplyr (R - add column that counts sequentially within groups but repeats for duplicates): # Example data date <- rep(c("2016-10-06 11:56:00","2016-10-05 11:56:00","2016-10-05 11:56:00","2016-10-07 11:56:00"),2) date <- as.POSIXct(date) group <- c(rep("A",4), rep("B",4)) df <- data.frame(group, date) # dplyr - dense_rank df2 <- df %>% group_by(group) %>% mutate(m.test=dense_rank(date)) group date m.test

Find nearest preceding and following dates between data frames

阅读更多关于 Find nearest preceding and following dates between data frames

问题 I have the following two data frames: df1 <- data.frame(ID = c("A","A","B","B","C","D","D","D","E"), Date = as.POSIXct(c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00","2018-04-17 14:21:00","2018-04-18 09:56:00","2018-05-02 07:49:00"))) df2 <- data.frame(ID = c("A","A","A","B","C","D","D","D","D","D","E"), Date = as.POSIXct(c("2018-04-10 07:11:00","2018-04-11 18:59:00","2018-04-12 12:37:00","2018-04-15 01:43

Passing function argument to data.table i

阅读更多关于 Passing function argument to data.table i

问题 Say we have a data.table myDT <- data.table(id = c("a", "a", "b", "b", "c"), value = 1:5) setkey(myDT, id) I'd like to create a function fun <- function(id) { ... } such that if foo <- rep("b", 6) then fun(foo) # I want this to return 3 4 Basically, I want to pass id[[1]] from the execution environment to the i argument of myDT . I'm having a really hard time accessing the correct environment here and am looking for some help. Changing the name of the function argument is not an option. 回答1:

Update data.table based on multiple columns and conditions

阅读更多关于 Update data.table based on multiple columns and conditions

问题 This is a follow-up-question from Efficient way to subset data.table based on value in any of selected columns. sample data I have got a data.table with 5 p-columns, indicating a type (type1 or type2 or NA ). I also have got 5 r-columns, indicating a score (1-10, or NA ). library(data.table) set.seed(123) v <- c( "type1", "type2", NA_character_ ) v2 <- c( 1:10, rep( NA_integer_, 10 ) ) DT <- data.table( id = 1:100, p1 = sample(v, 100, replace = TRUE ), p2 = sample(v, 100, replace = TRUE ), p3