tidyverse

Using tidy eval for multiple dplyr filter conditions

这一生的挚爱 提交于 2019-12-01 17:48:17
问题 I'm new to tidy eval and trying to write generic functions- one thing I'm struggling with right now is writing multiple filter conditions for categorical variables. This is what I'm using right now- create_expr <- function(name, val){ if(!is.null(val)) val <- paste0("c('", paste0(val, collapse = "','"), "')") paste(name, "%in%", val) } my_filter <- function(df, cols, conds){ # Args: # df: dataframe which is to be filtered # cols: list of column names which are to be filtered # conds:

How to use dplyr programming syntax to create and evaluate variable names

安稳与你 提交于 2019-12-01 17:27:53
问题 I would like to dynamically input a variable name using dplyr programming syntax, however, as many have described this can be quite confusing. I've played around with various combinations of quo/enquo !! etc. to no avail. Here is the simplest form of my code library(tidyverse) df <- tibble( color1 = c("blue", "blue", "blue", "blue", "blue"), color2 = c("black", "black", "black", "black", "black"), value = 1:5 ) num <- 2 df %>% mutate(color3 = !!(paste0("color", num))) #> # A tibble: 5 x 4 #>

Creating models and augmenting data without losing additional columns in dplyr/broom

时光毁灭记忆、已成空白 提交于 2019-12-01 16:31:02
Consider the following data / example. Each dataset contains a number of samples with one observation and one estimate: library(tidyverse) library(broom) data = read.table(text = ' dataset sample_id observation estimate A A1 4.8 4.7 A A2 4.3 4.5 A A3 3.1 2.9 A A4 2.1 2 A A5 1.1 1 B B1 4.5 4.3 B B2 3.9 4.1 B B3 2.9 3 B B4 1.8 2 B B5 1 1.2 ', header = TRUE) I want to calculate a linear model per dataset to remove any linear bias between observation and estimate, and get the fitted values next to the original ones: data %>% group_by(dataset) %>% do(lm(observation ~ estimate, data = .) %>% augment

R: create dummy variables based on a categorical variable *of lists* [duplicate]

荒凉一梦 提交于 2019-12-01 15:24:56
问题 This question already has answers here : How can I split a character string into column vectors with a 1/0 value flag? (7 answers) Closed 7 months ago . I have a data frame with a categorical variable holding lists of strings, with variable length (it is important because otherwise this question would be a duplicate of this or this), e.g.: df <- data.frame(x = 1:5) df$y <- list("A", c("A", "B"), "C", c("B", "D", "C"), "E") df x y 1 1 A 2 2 A, B 3 3 C 4 4 B, D, C 5 5 E And the desired form is

Creating models and augmenting data without losing additional columns in dplyr/broom

故事扮演 提交于 2019-12-01 15:11:08
问题 Consider the following data / example. Each dataset contains a number of samples with one observation and one estimate: library(tidyverse) library(broom) data = read.table(text = ' dataset sample_id observation estimate A A1 4.8 4.7 A A2 4.3 4.5 A A3 3.1 2.9 A A4 2.1 2 A A5 1.1 1 B B1 4.5 4.3 B B2 3.9 4.1 B B3 2.9 3 B B4 1.8 2 B B5 1 1.2 ', header = TRUE) I want to calculate a linear model per dataset to remove any linear bias between observation and estimate, and get the fitted values next

Add row in each group using dplyr and add_row()

我们两清 提交于 2019-12-01 14:27:45
问题 If I add a new row to the ìris dataset with: iris <- as_tibble(iris) > iris %>% add_row(.before=0) # A tibble: 151 × 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <chr> 1 NA NA NA NA <NA> <--- Good! 2 5.1 3.5 1.4 0.2 setosa 3 4.9 3.0 1.4 0.2 setosa It works. So, why can't I add a new row on top of each "subset" with: iris %>% group_by(Species) %>% add_row(.before=0) Error: is.data.frame(df) is not TRUE 回答1: If you want to use a grouped operation, you need

Separate contents of field

▼魔方 西西 提交于 2019-12-01 14:07:18
I'm sure this is very simple, and I think it's a case of using separate and gather. I have a single field in a dataframe, authorlist,an edited export of a pubmed search. It contains the authors of the publications. It can, obviously, contain either a single author or a collaboration of authors. For example this is just a selection of the options available: Author Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P. What I'd like to do is create a single list of ALL authors so that I'd have something like Author Drijgers RL Verhey FR Leentjens AF Kahler S Aalten P How do I do that? I

counting the number of times a value appears in a column in relation to other columns in r

偶尔善良 提交于 2019-12-01 13:45:49
I am new to r and I have a dataframe very close to the one below and I would love to find a general way that tells me how many times plus 1, the number "0" appears for each country (intro4) and id. Intro4 number id 221 TAN 0 19 222 TAN 0 73 223 TAN 0 73 224 TOG 0 37 225 TOG 0 58 226 UGA 0 96 227 UGA 0 112 228 UGA 0 96 229 ZAM 0 40 230 ZAM 0 99 231 ZAM 0 139 I can do it by hand by it is a big data frame and would take forever, count () gives me the frequency but doesn't divide it between different countries. I have found a way to do it but I will have to select and filter for each individual

How to apply same operation to multiple data frames in dplyr-R?

夙愿已清 提交于 2019-12-01 12:55:05
I would like to apply the same operation to multiple data frames in 'R' but cannot get how to deal with this matter. This is an example of pipe operation in dplyr : library(dplyr) iris %>% mutate(Sepal=rowSums(select(.,starts_with("Sepal"))), Length=rowSums(select(.,ends_with("Length"))), Width=rowSums(select(.,ends_with("Width")))) iris2 <- iris iris3 <- iris Could you suggest how to apply the same pipe function to iris , iris2 and isis3 ? I need to use dplyr piping operation. I suppose map function may help but as I have not fully understand its concept, I got errors to apply it. Sample

Using nested function with lapply

懵懂的女人 提交于 2019-12-01 12:13:19
This code works (takes hours minutes and seconds and converts to seconds only): library(lubridate) library(tidyverse) original_date_time<-"2018-01-3111:59:59" period_to_seconds(hms(paste(hour(original_date_time), minute(original_date_time),second(original_date_time), sep = ":"))) I have this tibble: df<-data.frame("id"=c(1,2,3,4,5), "Time"=c("1999-12-31 10:10:10","1999-12-31 09:05:13","1999-12-31 00:05:25","1999-12-31 07:04","1999-12-31 03:05:07")) tib<-as_tibble(df) tib result: # A tibble: 5 x 2 id Time <dbl> <fct> 1 1 1999-12-31 10:10:10 2 2 1999-12-31 09:05:13 3 3 1999-12-31 00:05:25 4 4