data.table | 易学教程

Group by two columns and union of levels in R

阅读更多关于 Group by two columns and union of levels in R

问题 I am stuck on a problem that seem trivial but I am unable to figure it out right now. I don't even know how to formulate it properly, if you have any suggestions, you are welcome. I have a data.frame which I want to group/index depending on two columns. The thing is, the rows I want to group do not share the same values in those columns. Rather, some rows have the same value in one column, and then some of those rows have a common value with different rows in the second column (which I also

Drop ID with NA in a conditional group

阅读更多关于 Drop ID with NA in a conditional group

问题 Extending this question: I have some data prepared using the below code: # # Data Preparation ---------------------- library(lubridate) start_date <- "2018-10-30 00:00:00" start_date <- as.POSIXct(start_date, origin="1970-01-01") dates <- c(start_date) for(i in 1:287) { dates <- c(dates, start_date + minutes(i * 10)) } dates <- as.POSIXct(dates, origin="1970-01-01") date_val <- format(dates, '%d-%m-%Y') weather.forecast.data <- data.frame(dateTime = dates, date = date_val) weather.forecast

Drop ID with NA in a conditional group

阅读更多关于 Drop ID with NA in a conditional group

To stack up results in one masterfile in R

阅读更多关于 To stack up results in one masterfile in R

问题 Using this script I have created a specific folder for each csv file and then saved all my further analysis results in this folder. The name of the folder and csv file are same. The csv files are stored in the main/master directory. Now, I have created a csv file in each of these folders which contains a list of all the fitted values. I would now like to do the following: Set the working directory to the particular filename Read fitted values file Add a row/column stating the name of the site

Reshaping 2 column data.table from long to wide

阅读更多关于 Reshaping 2 column data.table from long to wide

问题 This is my data.frame: library(data.table) df<- fread(' predictions Label 3 A 4 B 5 C 1 A 2 B 3 C ') Desired Output: A B C 3 4 5 1 2 3 I am trying DesiredOutput<-dcast(df, Label+predictions ~ Label, value.var = "predictions") with no success. Your help is appreciated! 回答1: Maybe the base R function unstack is the cleanest solution: unstack(df) A B C 1 3 4 5 2 1 2 3 Note that this returns a data.frame rather than a data.table, so if you want a data.table at the end: df2 <- setDT(unstack(df))

dplyr lag() inside mutate() for rolling values forward

阅读更多关于 dplyr lag() inside mutate() for rolling values forward

问题 I'm attempting to roll a value forward using dplyr 's mutate() and lag() . I'm trying the below code to make it work. Instead of it working as I expect it to, I get ZEROs in the BegFund column after the first row. I've tried using data.table shift() with no luck, and stats::lag() with no luck as well. Anyone have any ideas? Below is a simplified example of what I'm attempting to do. Reproduces when I test. library(dplyr) # 0.4.3 payments <- 1:10 fund.start <- 1000 payment.percent <- .05 fund

What does “data.table” “data.frame” class mean?

阅读更多关于 What does “data.table” “data.frame” class mean?

问题 I am running data.frames through an analysis that takes around 45 minutes to complete. I recently began working with a new data set with similar content and (I thought) similar structure, but found that it was taking exponentially longer to analyze and producing errors that made it evident that there were structural differences. Looking at attributes, I found that it was $class "data.table" "data.frame" . Running as.dataframe() seems to have converted it to just "data.frame" , and processing

create counter variable with Boolean condition using value from the previous row

阅读更多关于 create counter variable with Boolean condition using value from the previous row

问题 I want to create a counter variable c based on the group variable user and True or False variable B . DT <- data.table(time=c(1,2,3,1,1,2,3,1,1,1),user=c(1,1,1,2,3,3,3,4,4,5), B=c('t','f','t','f','f','t','t','t','t','t')) DT The desired output of variable c time user B C 1: 1 1 t 1 2: 2 1 f 1 3: 3 1 t 2 4: 1 2 f 0 5: 1 3 f 0 6: 2 3 t 1 7: 3 3 t 2 8: 1 4 t 1 9: 2 4 t 2 10: 1 5 t 1 variable c is a counter within the group when B is true. The logic (NOT code) of variable c is as follow. The

Selected rows in data.table not being removed first time (must remove twice)

阅读更多关于 Selected rows in data.table not being removed first time (must remove twice)

问题 I'm getting some strange behaviour with data.table in R. I want to keep only a certain subset of rows, e.g., DT <- DT[max.seq == 1] , which (I thought) always worked fine in the past. But with this particular data set I don't know if it's my code or some data.table functionality that I've misunderstood. It seems the command to remove rows I don't want needs to be run twice to work properly. Specifically, I'm trying to remove non-sequential firm-level time series by keeping only the longest

R- collapse rows based on contents of two columns

阅读更多关于 R- collapse rows based on contents of two columns

问题 I apologize in advance if this question is too specific or involved for this type of forum. I have been a long time lurker on this site, and this is the first time I haven't been able to solve my issue by looking at previous questions, so I finally decided to post. Please let me know if there is a better place to post this, or if you have advice on making it more clear. here goes. I have a data.table with the following structure: library(data.table) dt = structure(list(chr = c("chr1", "chr1",