dplyr | 易学教程

Self Joining in R

阅读更多关于 Self Joining in R

问题 Here is a sample tibble: test <- tibble(a = c("dd1","dd2","dd3","dd4","dd5"), name = c("a", "b", "c", "d", "e"), b = c("dd3","dd4","dd1","dd5","dd2")) And I want to add a new column b_name as self-join to test using: dplyr::inner_join(test, test, by = c("a" = "b")) My table is way to large (2.7M rows with 4 columns) and I get the following error: Error: std::bad_alloc Please advise how to do it right / best practice. My final goal is to get the following structure: a name b b_name dd1 a dd3 c

how to replace the now deprecated funs within rename_at

阅读更多关于 how to replace the now deprecated funs within rename_at

问题 I'm trying (and succeeding) at renaming multiple columns in my dataframe using this code: rename_at(c("a", "b", "c"), funs(paste0(., "_revenue"))) However, I then get this warning: funs() is soft deprecated as of dplyr 0.8.0 Please use a list of either functions or lambdas: # Simple named list: list(mean = mean, median = median) # Auto named with `tibble::lst()`: tibble::lst(mean, median) # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE)) I tried looking at https://dplyr

How to create new columns in a data.frame based on row values in R?

阅读更多关于 How to create new columns in a data.frame based on row values in R?

问题 Hej, I have a data.frame with family trios, and I would like to add a column with the full sibs of every "id" (= offspring). My data: df id dam sire 1: 83295 67606 79199 2: 83297 67606 79199 3: 89826 67606 79199 What I would like to retrieve: df2 id dam sire fs1 fs2 1: 83295 67606 79199 83297 89826 2: 83297 67606 79199 83295 89826 3: 89826 67606 79199 83295 83297 What I’ve tried: (similar to: How to transform a dataframes row into columns in R?) library(dplyr) library(splitstackshape) df2 <-

How to create new columns in a data.frame based on row values in R?

阅读更多关于 How to create new columns in a data.frame based on row values in R?

Quosures in R, how to use the !! operator (tidy-evaluation)

阅读更多关于 Quosures in R, how to use the !! operator (tidy-evaluation)

问题 I'm trying to understand tidy evaluation in R. grouped_mean <- function(data, group_var, summary_var) { group_var <- enquo(group_var) summary_var <- enquo(summary_var) data %>% group_by(!!group_var) %>% summarise(mean = mean(!!summary_var)) } I understand why and how to use it but not what actually happens I guess. var <- "test" var <- enquo(var) !!var Error in is_quosure(e2) : argument "e2" is missing, with no default This gives me an error while I expected it to work outside dplyr too. Why

Grouped non-dense rank without omitted values

阅读更多关于 Grouped non-dense rank without omitted values

问题 I have the following data.frame: df <- data.frame(date = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3), id = c(4, 4, 2, 4, 1, 2, 3, 1, 2, 2, 1, 1)) And I want to add a new column grp which, for each date, ranks the IDs. Ties should have the same value, but there should be no omitted values. That is, if there are two values which are equally minimum, they should both get rank 1, and the next lowest values should get rank 2. The expected result would therefore look like this. Note that, as mentioned,

replace values with NA across multiple columns if a condition is met in R

阅读更多关于 replace values with NA across multiple columns if a condition is met in R

问题 I'm trying to replace values across values with NA across multiple columns if a condition is met. Here's a sample dataset: library(tidyverse) sample <- tibble(id = 1:6, team_score = 5:10, cent_dept_test_agg = c(1, 2, 3, 4, 5, 6), cent_dept_blue_agg = c(15:20), num_in_dept = c(1, 1, 2, 5, 100, 6)) I want the columns that contain cent_dept_.*_agg to be NA when num_in_dept is 1, so it looks like this: library(tidyverse) solution <- tibble(id = 1:6, team_score = 5:10, cent_dept_test_agg = c(NA,

Operations between groups with dplyr

阅读更多关于 Operations between groups with dplyr

问题 I have a data frame as follow where I would like to group the data by grp and index and use group a as a reference to perform some simple calculations. I would like to subtract the variable value from other group from the values of group a . df <- data.frame(grp = rep(letters[1:3], each = 2), index = rep(1:2, times = 3), value = seq(10, 60, length.out = 6)) df ## grp index value ## 1 a 1 10 ## 2 a 2 20 ## 3 b 1 30 ## 4 b 2 40 ## 5 c 1 50 ## 6 c 2 60 The desired outpout would be like: ## grp

Creating new variables with purrr (how does one go about that?)

阅读更多关于 Creating new variables with purrr (how does one go about that?)

问题 I have a large data set, with a bunch of columns that I want to run the same function on, based on either prefix or suffix, to create a new variable. What I would like to be able to do is provide a list to map, and create new variables. dataframe <- data_frame(x_1 = c(1,2,3,4,5,6), x_2 = c(1,1,1,2,2,2), y_1 = c(200,400,120,300,100,100), y_2 = c(250,500,150,240,140,400)) newframe <- dataframe %>% mutate(x_ratio = x_1/x_2, y_ratio = y_1/y_2) In the past, i have written code in a string

Calculate change since base year?

阅读更多关于 Calculate change since base year?

问题 I have a dataset that looks something like this: df1 <- data.frame(id = c(rep("A1",4), rep("A2",4)), time = rep(c(0,2:4), 2), y1 = rnorm(8), y2 = rnorm(8)) For each of the y variables, I want to calculate their change since time==0 . Basically, I want to do this: calc_chage <- function(id, data){ #y1 y1_0 <- data$y1[which(data$time==0 & data$id==id)] D2y1 <- data$y1[which(data$time==2 & data$id==id)] - y1_0 D3y1 <- data$y1[which(data$time==3 & data$id==id)] - y1_0 D4y1 <- data$y1[which(data