data.table

Group by two columns and union of levels in R

断了今生、忘了曾经 提交于 2020-01-17 07:22:08
问题 I am stuck on a problem that seem trivial but I am unable to figure it out right now. I don't even know how to formulate it properly, if you have any suggestions, you are welcome. I have a data.frame which I want to group/index depending on two columns. The thing is, the rows I want to group do not share the same values in those columns. Rather, some rows have the same value in one column, and then some of those rows have a common value with different rows in the second column (which I also

Drop ID with NA in a conditional group

僤鯓⒐⒋嵵緔 提交于 2020-01-17 02:13:17
问题 Extending this question: I have some data prepared using the below code: # # Data Preparation ---------------------- library(lubridate) start_date <- "2018-10-30 00:00:00" start_date <- as.POSIXct(start_date, origin="1970-01-01") dates <- c(start_date) for(i in 1:287) { dates <- c(dates, start_date + minutes(i * 10)) } dates <- as.POSIXct(dates, origin="1970-01-01") date_val <- format(dates, '%d-%m-%Y') weather.forecast.data <- data.frame(dateTime = dates, date = date_val) weather.forecast

Drop ID with NA in a conditional group

天涯浪子 提交于 2020-01-17 02:13:10
问题 Extending this question: I have some data prepared using the below code: # # Data Preparation ---------------------- library(lubridate) start_date <- "2018-10-30 00:00:00" start_date <- as.POSIXct(start_date, origin="1970-01-01") dates <- c(start_date) for(i in 1:287) { dates <- c(dates, start_date + minutes(i * 10)) } dates <- as.POSIXct(dates, origin="1970-01-01") date_val <- format(dates, '%d-%m-%Y') weather.forecast.data <- data.frame(dateTime = dates, date = date_val) weather.forecast

To stack up results in one masterfile in R

吃可爱长大的小学妹 提交于 2020-01-17 00:41:13
问题 Using this script I have created a specific folder for each csv file and then saved all my further analysis results in this folder. The name of the folder and csv file are same. The csv files are stored in the main/master directory. Now, I have created a csv file in each of these folders which contains a list of all the fitted values. I would now like to do the following: Set the working directory to the particular filename Read fitted values file Add a row/column stating the name of the site

Reshaping 2 column data.table from long to wide

送分小仙女□ 提交于 2020-01-16 19:37:09
问题 This is my data.frame: library(data.table) df<- fread(' predictions Label 3 A 4 B 5 C 1 A 2 B 3 C ') Desired Output: A B C 3 4 5 1 2 3 I am trying DesiredOutput<-dcast(df, Label+predictions ~ Label, value.var = "predictions") with no success. Your help is appreciated! 回答1: Maybe the base R function unstack is the cleanest solution: unstack(df) A B C 1 3 4 5 2 1 2 3 Note that this returns a data.frame rather than a data.table, so if you want a data.table at the end: df2 <- setDT(unstack(df))

dplyr lag() inside mutate() for rolling values forward

给你一囗甜甜゛ 提交于 2020-01-16 18:18:44
问题 I'm attempting to roll a value forward using dplyr 's mutate() and lag() . I'm trying the below code to make it work. Instead of it working as I expect it to, I get ZEROs in the BegFund column after the first row. I've tried using data.table shift() with no luck, and stats::lag() with no luck as well. Anyone have any ideas? Below is a simplified example of what I'm attempting to do. Reproduces when I test. library(dplyr) # 0.4.3 payments <- 1:10 fund.start <- 1000 payment.percent <- .05 fund

What does “data.table” “data.frame” class mean?

会有一股神秘感。 提交于 2020-01-16 16:11:18
问题 I am running data.frames through an analysis that takes around 45 minutes to complete. I recently began working with a new data set with similar content and (I thought) similar structure, but found that it was taking exponentially longer to analyze and producing errors that made it evident that there were structural differences. Looking at attributes, I found that it was $class "data.table" "data.frame" . Running as.dataframe() seems to have converted it to just "data.frame" , and processing

create counter variable with Boolean condition using value from the previous row

泄露秘密 提交于 2020-01-16 07:18:26
问题 I want to create a counter variable c based on the group variable user and True or False variable B . DT <- data.table(time=c(1,2,3,1,1,2,3,1,1,1),user=c(1,1,1,2,3,3,3,4,4,5), B=c('t','f','t','f','f','t','t','t','t','t')) DT The desired output of variable c time user B C 1: 1 1 t 1 2: 2 1 f 1 3: 3 1 t 2 4: 1 2 f 0 5: 1 3 f 0 6: 2 3 t 1 7: 3 3 t 2 8: 1 4 t 1 9: 2 4 t 2 10: 1 5 t 1 variable c is a counter within the group when B is true. The logic (NOT code) of variable c is as follow. The

Selected rows in data.table not being removed first time (must remove twice)

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-16 05:05:11
问题 I'm getting some strange behaviour with data.table in R. I want to keep only a certain subset of rows, e.g., DT <- DT[max.seq == 1] , which (I thought) always worked fine in the past. But with this particular data set I don't know if it's my code or some data.table functionality that I've misunderstood. It seems the command to remove rows I don't want needs to be run twice to work properly. Specifically, I'm trying to remove non-sequential firm-level time series by keeping only the longest

R- collapse rows based on contents of two columns

一曲冷凌霜 提交于 2020-01-16 04:12:07
问题 I apologize in advance if this question is too specific or involved for this type of forum. I have been a long time lurker on this site, and this is the first time I haven't been able to solve my issue by looking at previous questions, so I finally decided to post. Please let me know if there is a better place to post this, or if you have advice on making it more clear. here goes. I have a data.table with the following structure: library(data.table) dt = structure(list(chr = c("chr1", "chr1",