dplyr

Calculate change since base year?

情到浓时终转凉″ 提交于 2021-01-28 17:36:17
问题 I have a dataset that looks something like this: df1 <- data.frame(id = c(rep("A1",4), rep("A2",4)), time = rep(c(0,2:4), 2), y1 = rnorm(8), y2 = rnorm(8)) For each of the y variables, I want to calculate their change since time==0 . Basically, I want to do this: calc_chage <- function(id, data){ #y1 y1_0 <- data$y1[which(data$time==0 & data$id==id)] D2y1 <- data$y1[which(data$time==2 & data$id==id)] - y1_0 D3y1 <- data$y1[which(data$time==3 & data$id==id)] - y1_0 D4y1 <- data$y1[which(data

Join big dataframe in r and filter in the same time

笑着哭i 提交于 2021-01-28 14:30:35
问题 df1 = data.frame(id=1,start=as.Date("2012-07-05"),end=as.Date("2012-07-15")) df2 = data.frame(id=rep(1,1371),date = as.Date(as.Date("2012-05-06"):as.Date("2016-02-05"))) output = dplyr::inner_join(x=df1,y=df2,by="id") %>% filter(date>=start & date<= end) I have two dataframes which have each one about one millions rows and I want to join them by id and then filter so that for each row, value of column date is comprised between value of startdate and enddate. An dplyr::inner_join is not

R sum by group if date within date range

浪子不回头ぞ 提交于 2021-01-28 14:26:17
问题 Suppose I have two dataframes. The first one includes "Date" at which a "Name" issues a "Rec" for an "ID" and the "Stop.Date" at which "Rec" becomes invalid. df (only a part) structure(list(Date = structure(c(13236, 13363, 14074, 13199, 14554), class = "Date"), ID = c("AU0000XINAA9", "AU0000XINAA9", "AU0000XINAC5", "AU0000XINAI2", "AU0000XINAJ0"), Name = c("N+1 BREWIN", "N+1 BREWIN", "ARBUTHNOT SECURITIES LTD.", "INVESTEC BANK (UK) PLC", "AWRAQ INVESTMENTS"), Rec = c(1, 2, 2, 2, 1), Stop.Date

Unlist column in data frame with listed

霸气de小男生 提交于 2021-01-28 12:53:26
问题 I have a list with multiple levels that I would like to the data level into a data frame, where the variable chr is collapsed into single strings. myList <- list(total_reach = list(4), data = list(list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company A"), list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company B"))) I would like to transform this into a data frame that looks like this: reach chr nr company 1 2 A, B, C 3 Company A 2 2 A, B, C 3 Company B Using

Unlist column in data frame with listed

為{幸葍}努か 提交于 2021-01-28 12:52:11
问题 I have a list with multiple levels that I would like to the data level into a data frame, where the variable chr is collapsed into single strings. myList <- list(total_reach = list(4), data = list(list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company A"), list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company B"))) I would like to transform this into a data frame that looks like this: reach chr nr company 1 2 A, B, C 3 Company A 2 2 A, B, C 3 Company B Using

Unlist column in data frame with listed

怎甘沉沦 提交于 2021-01-28 12:50:33
问题 I have a list with multiple levels that I would like to the data level into a data frame, where the variable chr is collapsed into single strings. myList <- list(total_reach = list(4), data = list(list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company A"), list(reach = 2, chr = list("A", "B", "C"), nr = 3, company = "Company B"))) I would like to transform this into a data frame that looks like this: reach chr nr company 1 2 A, B, C 3 Company A 2 2 A, B, C 3 Company B Using

Utilizing functions within across() in dplyr to work with paired-columns

别说谁变了你拦得住时间么 提交于 2021-01-28 12:25:33
问题 set.seed(3) library(dplyr) x <- tibble(Measure = c("Height","Weight","Width","Length"), AD1_1= rpois(4,10), AD1_2= rpois(4,9), AD2_1= rpois(4,10), AD2_2= rpois(4,9), AD3_1= rpois(4,10), AD3_2= rpois(4,9)) Suppose I have data that looks like this. I wish to run a function for each AD, paired with underscored number, i.e., AD1fun, AD2fun,AD3fun. Instead of writing, fun <- function(x,y){x-y} dat %>% mutate(AD1fun = fun(AD1_1,AD1_2), AD2fun = fun(AD2_1,AD2_2), ...) Finding the differences of

Utilizing functions within across() in dplyr to work with paired-columns

不问归期 提交于 2021-01-28 12:17:51
问题 set.seed(3) library(dplyr) x <- tibble(Measure = c("Height","Weight","Width","Length"), AD1_1= rpois(4,10), AD1_2= rpois(4,9), AD2_1= rpois(4,10), AD2_2= rpois(4,9), AD3_1= rpois(4,10), AD3_2= rpois(4,9)) Suppose I have data that looks like this. I wish to run a function for each AD, paired with underscored number, i.e., AD1fun, AD2fun,AD3fun. Instead of writing, fun <- function(x,y){x-y} dat %>% mutate(AD1fun = fun(AD1_1,AD1_2), AD2fun = fun(AD2_1,AD2_2), ...) Finding the differences of

How can I dynamically create new variables/columns on databases in R using dplyr?

余生颓废 提交于 2021-01-28 11:48:19
问题 I am new to Stackoverflow and quite new to R. I would really appreciate your help. I am using dplyr 's mutate() function to create a set new columns based on one initial column. For an a priori known number of columns to be created, everything works fine. However, in my application, the number of new columns to be created is unknown (or rather determined as input parameter before running the code). For illustration, consider the following minimal working example: library(RSQLite) library

R sampling into groups of specific size based on count data

半腔热情 提交于 2021-01-28 11:45:34
问题 I want to take a df as the one below and want to cut/bin/group/sample into groups of size=20. Ideally, this "binning" occurs randomly across IDs rather then consecutively from top row to bottom row). E.g. IDs 2, 29 and 71 have counts of 7,7,6 and would fit nicely into a "bin" of size=20. I want to achieve the minimum number of bins and do not care about order of IDs (the more random they are, the better). set.seed(123) df <- data.frame( ID = as.numeric(1:100), Count = as.numeric(sample(1:8,