dplyr | 易学教程

Calculate change since base year?

阅读更多关于 Calculate change since base year?

问题 I have a dataset that looks something like this: df1 <- data.frame(id = c(rep("A1",4), rep("A2",4)), time = rep(c(0,2:4), 2), y1 = rnorm(8), y2 = rnorm(8)) For each of the y variables, I want to calculate their change since time==0 . Basically, I want to do this: calc_chage <- function(id, data){ #y1 y1_0 <- data$y1[which(data$time==0 & data$id==id)] D2y1 <- data$y1[which(data$time==2 & data$id==id)] - y1_0 D3y1 <- data$y1[which(data$time==3 & data$id==id)] - y1_0 D4y1 <- data$y1[which(data

Join big dataframe in r and filter in the same time

阅读更多关于 Join big dataframe in r and filter in the same time

问题 df1 = data.frame(id=1,start=as.Date("2012-07-05"),end=as.Date("2012-07-15")) df2 = data.frame(id=rep(1,1371),date = as.Date(as.Date("2012-05-06"):as.Date("2016-02-05"))) output = dplyr::inner_join(x=df1,y=df2,by="id") %>% filter(date>=start & date<= end) I have two dataframes which have each one about one millions rows and I want to join them by id and then filter so that for each row, value of column date is comprised between value of startdate and enddate. An dplyr::inner_join is not

R sum by group if date within date range

阅读更多关于 R sum by group if date within date range

问题 Suppose I have two dataframes. The first one includes "Date" at which a "Name" issues a "Rec" for an "ID" and the "Stop.Date" at which "Rec" becomes invalid. df (only a part) structure(list(Date = structure(c(13236, 13363, 14074, 13199, 14554), class = "Date"), ID = c("AU0000XINAA9", "AU0000XINAA9", "AU0000XINAC5", "AU0000XINAI2", "AU0000XINAJ0"), Name = c("N+1 BREWIN", "N+1 BREWIN", "ARBUTHNOT SECURITIES LTD.", "INVESTEC BANK (UK) PLC", "AWRAQ INVESTMENTS"), Rec = c(1, 2, 2, 2, 1), Stop.Date

Unlist column in data frame with listed

阅读更多关于 Unlist column in data frame with listed

问题 I have a list with multiple levels that I would like to the data level into a data frame, where the variable chr is collapsed into single strings. myList <- list(total_reach = list(4), data = list(list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company A"), list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company B"))) I would like to transform this into a data frame that looks like this: reach chr nr company 1 2 A, B, C 3 Company A 2 2 A, B, C 3 Company B Using

Unlist column in data frame with listed

阅读更多关于 Unlist column in data frame with listed

Unlist column in data frame with listed

阅读更多关于 Unlist column in data frame with listed

Utilizing functions within across() in dplyr to work with paired-columns

阅读更多关于 Utilizing functions within across() in dplyr to work with paired-columns

问题 set.seed(3) library(dplyr) x <- tibble(Measure = c("Height","Weight","Width","Length"), AD1_1= rpois(4,10), AD1_2= rpois(4,9), AD2_1= rpois(4,10), AD2_2= rpois(4,9), AD3_1= rpois(4,10), AD3_2= rpois(4,9)) Suppose I have data that looks like this. I wish to run a function for each AD, paired with underscored number, i.e., AD1fun, AD2fun,AD3fun. Instead of writing, fun <- function(x,y){x-y} dat %>% mutate(AD1fun = fun(AD1_1,AD1_2), AD2fun = fun(AD2_1,AD2_2), ...) Finding the differences of

Utilizing functions within across() in dplyr to work with paired-columns

阅读更多关于 Utilizing functions within across() in dplyr to work with paired-columns

How can I dynamically create new variables/columns on databases in R using dplyr?

阅读更多关于 How can I dynamically create new variables/columns on databases in R using dplyr?

问题 I am new to Stackoverflow and quite new to R. I would really appreciate your help. I am using dplyr 's mutate() function to create a set new columns based on one initial column. For an a priori known number of columns to be created, everything works fine. However, in my application, the number of new columns to be created is unknown (or rather determined as input parameter before running the code). For illustration, consider the following minimal working example: library(RSQLite) library

R sampling into groups of specific size based on count data

阅读更多关于 R sampling into groups of specific size based on count data

问题 I want to take a df as the one below and want to cut/bin/group/sample into groups of size=20. Ideally, this "binning" occurs randomly across IDs rather then consecutively from top row to bottom row). E.g. IDs 2, 29 and 71 have counts of 7,7,6 and would fit nicely into a "bin" of size=20. I want to achieve the minimum number of bins and do not care about order of IDs (the more random they are, the better). set.seed(123) df <- data.frame( ID = as.numeric(1:100), Count = as.numeric(sample(1:8,