dplyr | 易学教程

Conditionally sum dynamic columns in r

阅读更多关于 Conditionally sum dynamic columns in r

问题 I am trying to conditionally sum across many columns depending on if they are greater than or less than 0. I am surprised I cannot find a dplyr or data.table work around for this. I want to calculate 4 new columns for a large data.frame (columns to calculate are at bottom of post). dat2=matrix(nrow=10,rnorm(100));colnames(dat2)=paste0('V',rep(1:10)) dat2 %>% as.data.frame() %>% rowwise() %>% select_if(function(col){mean(col)>0}) %>% mutate(sum_pos=rowSums(.)) ##Obviously doesn't work These

Compare groups with each other

阅读更多关于 Compare groups with each other

问题 Is there a way in dplyr to compare groups with each other? Here a concrete example: I would like to apply a t-test to the following combinations: a vs b, a vs c and b vs c set.seed(1) tibble(value = c(rnorm(1000, 1, 1), rnorm(1000, 5, 1), rnorm(1000, 10,1)), group=c(rep("a", 1000), rep("b", 1000), rep("c", 1000))) %>% nest(value) # A tibble: 3 x 2 group data <chr> <list> 1 a <tibble [1,000 × 1]> 2 b <tibble [1,000 × 1]> 3 c <tibble [1,000 × 1]> If dplyr provides no solution, i would also be

Conditionally sum dynamic columns in r

阅读更多关于 Conditionally sum dynamic columns in r

removing the first 3 rows of a group with conditional statement in r

阅读更多关于 removing the first 3 rows of a group with conditional statement in r

问题 I would like to remove rows that are not fulfilling the condition that I want. For example: Event Value 1 1 1 0 1 0 1 0 2 8 2 7 2 1 2 0 2 0 2 0 3 8 3 0 3 0 3 0 3 0 If per event, in the column of value there is a number higher than 2 (Value > 2) remove the first 3 rows starting from that Value that is not fulfilling the criteria. It should look like this: Event Value 1 1 1 0 1 0 1 0 2 0 2 0 3 0 3 0 I have been able to remove the first row of each Event that accomplish the criteria, but haven't

case_when with partial string match and contains()

阅读更多关于 case_when with partial string match and contains()

问题 I'm working with a dataset that has many columns called status1, status2, etc. Within those columns, it says if someone is exempt, complete, registered, etc. Unfortunately, the exempt inputs are not consistent; here's a sample: library(dplyr) problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"), status1 = c("7EXEMPT", "Completed", "Completed", "Pending"), status2 = c("exempt", "Completed", "Completed", "Pending"), status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))

Assign max value of group to all rows in that group

阅读更多关于 Assign max value of group to all rows in that group

问题 I would like to assign the max value of a group to all rows within that group. How do I do that? I have a dataframe containing the names of the group and the max number of credits that belongs to it. course_credits <- aggregate(bsc_academic$Credits, by = list(bsc_academic$Course_code), max) which gives Course Credits 1 ABC1000 6.5 2 ABC1003 6.5 3 ABC1004 6.5 4 ABC1007 5.0 5 ABC1010 6.5 6 ABC1021 6.5 7 ABC1023 6.5 The main dataframe looks like this: Appraisal.Type Resits Credits Course_code

multidplyr : assign functions to cluster

阅读更多关于 multidplyr : assign functions to cluster

问题 (see working solution below) I want to use multidplyr to parallelize a function : calculs.R f <- function(x){ return(x+1) } main.R library(dplyr) library(multidplyr) source("calculs.R") d <- data.frame(a=1:1000,b=sample(1:2,1000),replace=T) result <- d %>% partition(b) %>% do(f(.)) %>% collect() I then get: Initialising 3 core cluster. Error in checkForRemoteErrors(lapply(cl, recvResult)) : 2 nodes produced errors; first error: could not find function "f" In addition: Warning message: group

multidplyr : assign functions to cluster

阅读更多关于 multidplyr : assign functions to cluster

Expanding a list to include all possible pairwise combinations within a group

阅读更多关于 Expanding a list to include all possible pairwise combinations within a group

问题 I am currently running a randomization where individuals of a given population are sampled and placed into groups of defined size. The result is a data frame seen below: Ind Group Sally 1 Bob 1 Sue 1 Joe 2 Jeff 2 Jess 2 Mary 2 Jim 3 James 3 Is there a function which will allow me to expand the data set to show every possible within group pairing? (Desired output below). The pairings do not need to be reciprocal. Group Ind1 Ind2 1 Sally Bob 1 Sally Sue 1 Sue Bob 2 Joe Jeff 2 Joe Jess 2 Joe

Aggregating strings using tostring and counting them in r

阅读更多关于 Aggregating strings using tostring and counting them in r

问题 I have following dataframe got after applying dplyr code Final_df<- df %>% group_by(clientID,month) %>% summarise(test=toString(Sector)) %>% as.data.frame() Which gives me following output ClientID month test ASD Sep Auto,Auto,Finance DFG Oct Finance,Auto,Oil How I want is to count sectors as well ClientID month test ASD Sep Auto:2,Finance:1 DFG Oct Finance:1,Auto:1,Oil:1 How can I achieve it with dplyr? 回答1: We can try df %>% group_by(client_id, month, Sector) %>% tally() %>% group_by(client