dplyr

Conditionally sum dynamic columns in r

女生的网名这么多〃 提交于 2021-01-27 19:50:34
问题 I am trying to conditionally sum across many columns depending on if they are greater than or less than 0. I am surprised I cannot find a dplyr or data.table work around for this. I want to calculate 4 new columns for a large data.frame (columns to calculate are at bottom of post). dat2=matrix(nrow=10,rnorm(100));colnames(dat2)=paste0('V',rep(1:10)) dat2 %>% as.data.frame() %>% rowwise() %>% select_if(function(col){mean(col)>0}) %>% mutate(sum_pos=rowSums(.)) ##Obviously doesn't work These

Compare groups with each other

谁说胖子不能爱 提交于 2021-01-27 19:21:01
问题 Is there a way in dplyr to compare groups with each other? Here a concrete example: I would like to apply a t-test to the following combinations: a vs b, a vs c and b vs c set.seed(1) tibble(value = c(rnorm(1000, 1, 1), rnorm(1000, 5, 1), rnorm(1000, 10,1)), group=c(rep("a", 1000), rep("b", 1000), rep("c", 1000))) %>% nest(value) # A tibble: 3 x 2 group data <chr> <list> 1 a <tibble [1,000 × 1]> 2 b <tibble [1,000 × 1]> 3 c <tibble [1,000 × 1]> If dplyr provides no solution, i would also be

Conditionally sum dynamic columns in r

倖福魔咒の 提交于 2021-01-27 19:20:39
问题 I am trying to conditionally sum across many columns depending on if they are greater than or less than 0. I am surprised I cannot find a dplyr or data.table work around for this. I want to calculate 4 new columns for a large data.frame (columns to calculate are at bottom of post). dat2=matrix(nrow=10,rnorm(100));colnames(dat2)=paste0('V',rep(1:10)) dat2 %>% as.data.frame() %>% rowwise() %>% select_if(function(col){mean(col)>0}) %>% mutate(sum_pos=rowSums(.)) ##Obviously doesn't work These

removing the first 3 rows of a group with conditional statement in r

喜夏-厌秋 提交于 2021-01-27 19:10:35
问题 I would like to remove rows that are not fulfilling the condition that I want. For example: Event Value 1 1 1 0 1 0 1 0 2 8 2 7 2 1 2 0 2 0 2 0 3 8 3 0 3 0 3 0 3 0 If per event, in the column of value there is a number higher than 2 (Value > 2) remove the first 3 rows starting from that Value that is not fulfilling the criteria. It should look like this: Event Value 1 1 1 0 1 0 1 0 2 0 2 0 3 0 3 0 I have been able to remove the first row of each Event that accomplish the criteria, but haven't

case_when with partial string match and contains()

自古美人都是妖i 提交于 2021-01-27 18:09:01
问题 I'm working with a dataset that has many columns called status1, status2, etc. Within those columns, it says if someone is exempt, complete, registered, etc. Unfortunately, the exempt inputs are not consistent; here's a sample: library(dplyr) problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"), status1 = c("7EXEMPT", "Completed", "Completed", "Pending"), status2 = c("exempt", "Completed", "Completed", "Pending"), status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))

Assign max value of group to all rows in that group

断了今生、忘了曾经 提交于 2021-01-27 17:47:44
问题 I would like to assign the max value of a group to all rows within that group. How do I do that? I have a dataframe containing the names of the group and the max number of credits that belongs to it. course_credits <- aggregate(bsc_academic$Credits, by = list(bsc_academic$Course_code), max) which gives Course Credits 1 ABC1000 6.5 2 ABC1003 6.5 3 ABC1004 6.5 4 ABC1007 5.0 5 ABC1010 6.5 6 ABC1021 6.5 7 ABC1023 6.5 The main dataframe looks like this: Appraisal.Type Resits Credits Course_code

multidplyr : assign functions to cluster

北慕城南 提交于 2021-01-27 16:36:12
问题 (see working solution below) I want to use multidplyr to parallelize a function : calculs.R f <- function(x){ return(x+1) } main.R library(dplyr) library(multidplyr) source("calculs.R") d <- data.frame(a=1:1000,b=sample(1:2,1000),replace=T) result <- d %>% partition(b) %>% do(f(.)) %>% collect() I then get: Initialising 3 core cluster. Error in checkForRemoteErrors(lapply(cl, recvResult)) : 2 nodes produced errors; first error: could not find function "f" In addition: Warning message: group

multidplyr : assign functions to cluster

↘锁芯ラ 提交于 2021-01-27 16:33:34
问题 (see working solution below) I want to use multidplyr to parallelize a function : calculs.R f <- function(x){ return(x+1) } main.R library(dplyr) library(multidplyr) source("calculs.R") d <- data.frame(a=1:1000,b=sample(1:2,1000),replace=T) result <- d %>% partition(b) %>% do(f(.)) %>% collect() I then get: Initialising 3 core cluster. Error in checkForRemoteErrors(lapply(cl, recvResult)) : 2 nodes produced errors; first error: could not find function "f" In addition: Warning message: group

Expanding a list to include all possible pairwise combinations within a group

蹲街弑〆低调 提交于 2021-01-27 15:13:29
问题 I am currently running a randomization where individuals of a given population are sampled and placed into groups of defined size. The result is a data frame seen below: Ind Group Sally 1 Bob 1 Sue 1 Joe 2 Jeff 2 Jess 2 Mary 2 Jim 3 James 3 Is there a function which will allow me to expand the data set to show every possible within group pairing? (Desired output below). The pairings do not need to be reciprocal. Group Ind1 Ind2 1 Sally Bob 1 Sally Sue 1 Sue Bob 2 Joe Jeff 2 Joe Jess 2 Joe

Aggregating strings using tostring and counting them in r

隐身守侯 提交于 2021-01-27 11:53:34
问题 I have following dataframe got after applying dplyr code Final_df<- df %>% group_by(clientID,month) %>% summarise(test=toString(Sector)) %>% as.data.frame() Which gives me following output ClientID month test ASD Sep Auto,Auto,Finance DFG Oct Finance,Auto,Oil How I want is to count sectors as well ClientID month test ASD Sep Auto:2,Finance:1 DFG Oct Finance:1,Auto:1,Oil:1 How can I achieve it with dplyr? 回答1: We can try df %>% group_by(client_id, month, Sector) %>% tally() %>% group_by(client