dplyr

Conditionally replace values of multiple columns, from values of other multiple columns

有些话、适合烂在心里 提交于 2021-02-04 06:57:47
问题 Suppose I have this dataset: set.seed (1234); data.frame(cbind(a=rep(c("si","no"),30),b=rnorm(60)), c=rep(c("d","e","f"),20)) %>% head() Then I want to add many columns (in this example I only added two), to identify distinct cases between each group (in this case, column "a"). set.seed(1234); data.frame(cbind(a=rep(c("si","no"),30),b=rnorm(60)),c=rep(c("d","e","f"),20)) %>% group_by(a) %>% dplyr::mutate_at(vars(c(b,c)), .funs= list(dups_hash_ing= ~n_distinct(.))) This code leaves the

Conditionally replace values of multiple columns, from values of other multiple columns

心已入冬 提交于 2021-02-04 06:52:59
问题 Suppose I have this dataset: set.seed (1234); data.frame(cbind(a=rep(c("si","no"),30),b=rnorm(60)), c=rep(c("d","e","f"),20)) %>% head() Then I want to add many columns (in this example I only added two), to identify distinct cases between each group (in this case, column "a"). set.seed(1234); data.frame(cbind(a=rep(c("si","no"),30),b=rnorm(60)),c=rep(c("d","e","f"),20)) %>% group_by(a) %>% dplyr::mutate_at(vars(c(b,c)), .funs= list(dups_hash_ing= ~n_distinct(.))) This code leaves the

How to merge two different groupings if they are not disjoint with dplyr

℡╲_俬逩灬. 提交于 2021-02-04 06:31:55
问题 Suppose that I have two sets of identifiers id1 and id2 in a data frame. How can I create a new identifier id3 that works as follows: I consider id1 as the stricter key, so that observations are first grouped in id1 and then in id2 . If there are two sets of rows with different values of id2 that have some of its elements with the same id1 , these two sets should have the same value for id3 (the exact value in id3 doesn't matter much). df <- data.frame(id1 = c(1, 1, 2, 2, 5, 6), id2 = c(4, 3,

How to merge two different groupings if they are not disjoint with dplyr

青春壹個敷衍的年華 提交于 2021-02-04 06:28:47
问题 Suppose that I have two sets of identifiers id1 and id2 in a data frame. How can I create a new identifier id3 that works as follows: I consider id1 as the stricter key, so that observations are first grouped in id1 and then in id2 . If there are two sets of rows with different values of id2 that have some of its elements with the same id1 , these two sets should have the same value for id3 (the exact value in id3 doesn't matter much). df <- data.frame(id1 = c(1, 1, 2, 2, 5, 6), id2 = c(4, 3,

How to filter dataframe with multiple conditions?

我与影子孤独终老i 提交于 2021-02-04 05:57:07
问题 I have this dataframe that I'll like to subset (if possible, with dplyr or base R functions): df <- data.frame(x = c(1,1,1,2,2,2), y = c(30,10,8,10,18,5)) x y 1 30 1 10 1 8 2 10 2 18 2 5 Assuming x are factors (so 2 conditions/levels), how can I subset/filter this dataframe so that I get only df$y values that are greater than 15 for df$x == 1 , and df$y values that are greater than 5 for df$x == 2 ? This is what I'd like to get: df2 <- data.frame(x = c(1,2,2), y = c(30,10,18)) x y 1 30 2 10 2

How to filter dataframe with multiple conditions?

冷暖自知 提交于 2021-02-04 05:56:08
问题 I have this dataframe that I'll like to subset (if possible, with dplyr or base R functions): df <- data.frame(x = c(1,1,1,2,2,2), y = c(30,10,8,10,18,5)) x y 1 30 1 10 1 8 2 10 2 18 2 5 Assuming x are factors (so 2 conditions/levels), how can I subset/filter this dataframe so that I get only df$y values that are greater than 15 for df$x == 1 , and df$y values that are greater than 5 for df$x == 2 ? This is what I'd like to get: df2 <- data.frame(x = c(1,2,2), y = c(30,10,18)) x y 1 30 2 10 2

Error with select function from dplyr

試著忘記壹切 提交于 2021-02-04 04:45:32
问题 When I use the select function from dplyr, it doesn't work and gives me an error stating that the column names that I want to select are unused arguments. However, if I specify dplyr before the function call like s: "dplyr::select" then it works as normal: Here is a sample df: sampledf <- structure(list(CRN = c(5497L, 6515L, 7248L, 36956L, 37021L), varA = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), varB = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA

Difference between Distinct vs Unique

房东的猫 提交于 2021-02-04 03:33:41
问题 What are the differences between distinct and unique in R using dplyr in consideration to: Speed Capabilities (valid inputs, parameters, etc) & Uses Output For example: library(dplyr) data(iris) # creating data with duplicates iris_dup <- bind_rows(iris, iris) d <- distinct(iris_dup) u <- unique(iris_dup) all(d==u) # returns True In this example distinct and unique perform the same function. Are there examples of times you should use one but not the other? Are there any tricks or common uses

Difference between Distinct vs Unique

耗尽温柔 提交于 2021-02-04 03:27:46
问题 What are the differences between distinct and unique in R using dplyr in consideration to: Speed Capabilities (valid inputs, parameters, etc) & Uses Output For example: library(dplyr) data(iris) # creating data with duplicates iris_dup <- bind_rows(iris, iris) d <- distinct(iris_dup) u <- unique(iris_dup) all(d==u) # returns True In this example distinct and unique perform the same function. Are there examples of times you should use one but not the other? Are there any tricks or common uses

Difference between Distinct vs Unique

回眸只為那壹抹淺笑 提交于 2021-02-04 03:27:07
问题 What are the differences between distinct and unique in R using dplyr in consideration to: Speed Capabilities (valid inputs, parameters, etc) & Uses Output For example: library(dplyr) data(iris) # creating data with duplicates iris_dup <- bind_rows(iris, iris) d <- distinct(iris_dup) u <- unique(iris_dup) all(d==u) # returns True In this example distinct and unique perform the same function. Are there examples of times you should use one but not the other? Are there any tricks or common uses