data.table

Joining data.table with by argument

徘徊边缘 提交于 2021-02-19 04:02:12
问题 I have two data.table dx and dy dx <- data.table(a = c(1,1,1,1,2,2), b = 3:8) dy <- data.table(a = c(1,1,2), c = 7:9) I want to join dy to each row of dx , and below is the desired output data.table(plyr::ddply(dx, c("a", "b"), function(d) merge(d, dy, by = "a"))) a b c 1: 1 3 7 2: 1 3 8 3: 1 4 7 4: 1 4 8 5: 1 5 7 6: 1 5 8 7: 1 6 7 8: 1 6 8 9: 2 7 9 10: 2 8 9 However, I failed to make the output only using operation inside [] of data.table or merge ? I have tired merge(dx, dy, by = "a", all =

Data.Table: Aggregate by every two weeks

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-19 03:43:10
问题 So let's take the following data.table. It has dates and a column of numbers. I'd like to get the week of each date and then aggregate (sum) of each two weeks. Date <- as.Date(c("1980-01-01", "1980-01-02", "1981-01-05", "1981-01-05", "1982-01-08", "1982-01-15", "1980-01-16", "1980-01-17", "1981-01-18", "1981-01-22", "1982-01-24", "1982-01-26")) Runoff <- c(2, 1, 0.1, 3, 2, 5, 1.5, 0.5, 0.3, 2, 1.5, 4) DT <- data.table(Date, Runoff) DT So from the date, I can easily get the year and week. DT[

Replace NA when last and next non-NA values are equal

非 Y 不嫁゛ 提交于 2021-02-19 02:42:48
问题 I have a sample table with some but not all NA values that need to be replaced. > dat id message index 1 1 <NA> 1 2 1 foo 2 3 1 foo 3 4 1 <NA> 4 5 1 foo 5 6 1 <NA> 6 7 2 <NA> 1 8 2 baz 2 9 2 <NA> 3 10 2 baz 4 11 2 baz 5 12 2 baz 6 13 3 bar 1 14 3 <NA> 2 15 3 <NA> 3 16 3 bar 4 17 3 <NA> 5 18 3 bar 6 19 3 <NA> 7 20 3 qux 8 My objective is to replace the NA values that are surrounded by the same "message" using the first appearance of the message (the least index value) and the last appearance

How to convert an ambiguous datetime column in data.table without using strptime?

ぃ、小莉子 提交于 2021-02-19 02:18:12
问题 My data.table has a column with an "ambiguous" datetime format: "12/1/2016 15:30". How can I convert this datetime to a format R recognizes in a data.table without using strptime() and getting the warning message for initially converting to POSIXlt. The process works but the warning makes me think there is another way. My data table: my_dates <- c("12/1/2016 15:30", "12/1/2016 15:31", "12/1/2016 15:32") this <- c("a", "b", "c") that <- c(1, 2, 3) my_table <- data.table(my_dates, this, that)

How to convert an ambiguous datetime column in data.table without using strptime?

左心房为你撑大大i 提交于 2021-02-19 02:15:04
问题 My data.table has a column with an "ambiguous" datetime format: "12/1/2016 15:30". How can I convert this datetime to a format R recognizes in a data.table without using strptime() and getting the warning message for initially converting to POSIXlt. The process works but the warning makes me think there is another way. My data table: my_dates <- c("12/1/2016 15:30", "12/1/2016 15:31", "12/1/2016 15:32") this <- c("a", "b", "c") that <- c(1, 2, 3) my_table <- data.table(my_dates, this, that)

How to count matches between a vector and dataframe of sequence coordinates?

梦想与她 提交于 2021-02-18 18:55:42
问题 Given a data table with start and end coordinates for sequences of integers: set.seed(1) df1 <- data.table( START = c(seq(1, 10000000, 10), seq(1, 10000000, 10), seq(1, 10000000, 10)), END = c(seq(10, 10000000, 10), seq(10, 10000000, 10), seq(10, 10000000, 10)) And a vector of integers: vec1 <- sample(1:100000, 10000) How can I count the number of integers in vec1 that are within the start and end coordinates of each sequence in df1? I am currently using a for loop: COUNT <- rep(NA, nrow(df1)

Method to operate on each row of data.table without using apply function

可紊 提交于 2021-02-18 13:53:21
问题 I wrote a simple function below: mcs <- function(v) { ifelse(sum((diff(sort(v)) > 6) > 0), NA, sd(v)) } It is supposed to take a vector, sort it and then check if there is difference greater than 6 in each successive difference. It returns NA if there is a difference greater than 6 and the standard deviation if there is not. I would like to apply this function across all rows of a data table (choosing only certain columns) and then append the return value for each row as a new column entry to

Is there a faster way than fread() to read big data?

£可爱£侵袭症+ 提交于 2021-02-18 11:27:28
问题 Hi first of all I already search on stack and google and found posts such at this one : Quickly reading very large tables as dataframes. While those are helpfull and well answered, I'm looking for more informations. I am looking for the best way to read/import "big" data that can go up to 50-60GB. I am currently using the fread() function from data.table and it is the function that is the fastest I know at the moment. The pc/server I work on got a good cpu (work station) and 32 GB RAM, but

R Data table - how to use previous row value within group [duplicate]

只愿长相守 提交于 2021-02-17 05:40:36
问题 This question already has answers here : How to create a lag variable within each group? (5 answers) Closed 5 years ago . I wish to calculate the difference between the current row and previous row, by groups. x = data.table(a=c(15, 25, 10, 12), b = c(1,1,2,2)) > x a b 1: 15 1 2: 25 1 3: 10 2 4: 12 2 > x[, c:= a - c(NA, a[.I-1]), by=b] Warning messages: 1: In a - c(NA, a[.I - 1]) : longer object length is not a multiple of shorter object length 2: In `[.data.table`(x, , `:=`(c, a - c(NA, a[.I

R, data.table, group by column *numbers* AND sum a column

不打扰是莪最后的温柔 提交于 2021-02-16 20:24:08
问题 Let's say I have the following data.table > DT # A B C D E N # 1: J t X D N 0.07898388 # 2: U z U L A 0.46906049 # 3: H a Z F S 0.50826435 # --- # 9998: X b R L X 0.49879990 # 9999: Z r U J J 0.63233668 # 10000: C b M K U 0.47796539 Now I need to group by a pair of columns and calculate sum N. That's easy to do when you know column names in advance: > DT[, sum(N), by=.(A,B)] # A B V1 # 1: J t 6.556897 # 2: U z 9.060844 # 3: H a 4.293426 # --- # 674: V z 11.439100 # 675: M x 1.736050 # 676: U