dplyr

Self Joining in R

淺唱寂寞╮ 提交于 2021-01-28 21:06:13
问题 Here is a sample tibble: test <- tibble(a = c("dd1","dd2","dd3","dd4","dd5"), name = c("a", "b", "c", "d", "e"), b = c("dd3","dd4","dd1","dd5","dd2")) And I want to add a new column b_name as self-join to test using: dplyr::inner_join(test, test, by = c("a" = "b")) My table is way to large (2.7M rows with 4 columns) and I get the following error: Error: std::bad_alloc Please advise how to do it right / best practice. My final goal is to get the following structure: a name b b_name dd1 a dd3 c

how to replace the now deprecated funs within rename_at

心不动则不痛 提交于 2021-01-28 19:54:11
问题 I'm trying (and succeeding) at renaming multiple columns in my dataframe using this code: rename_at(c("a", "b", "c"), funs(paste0(., "_revenue"))) However, I then get this warning: funs() is soft deprecated as of dplyr 0.8.0 Please use a list of either functions or lambdas: # Simple named list: list(mean = mean, median = median) # Auto named with `tibble::lst()`: tibble::lst(mean, median) # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE)) I tried looking at https://dplyr

How to create new columns in a data.frame based on row values in R?

≡放荡痞女 提交于 2021-01-28 19:30:40
问题 Hej, I have a data.frame with family trios, and I would like to add a column with the full sibs of every "id" (= offspring). My data: df id dam sire 1: 83295 67606 79199 2: 83297 67606 79199 3: 89826 67606 79199 What I would like to retrieve: df2 id dam sire fs1 fs2 1: 83295 67606 79199 83297 89826 2: 83297 67606 79199 83295 89826 3: 89826 67606 79199 83295 83297 What I’ve tried: (similar to: How to transform a dataframes row into columns in R?) library(dplyr) library(splitstackshape) df2 <-

How to create new columns in a data.frame based on row values in R?

邮差的信 提交于 2021-01-28 19:25:46
问题 Hej, I have a data.frame with family trios, and I would like to add a column with the full sibs of every "id" (= offspring). My data: df id dam sire 1: 83295 67606 79199 2: 83297 67606 79199 3: 89826 67606 79199 What I would like to retrieve: df2 id dam sire fs1 fs2 1: 83295 67606 79199 83297 89826 2: 83297 67606 79199 83295 89826 3: 89826 67606 79199 83295 83297 What I’ve tried: (similar to: How to transform a dataframes row into columns in R?) library(dplyr) library(splitstackshape) df2 <-

Quosures in R, how to use the !! operator (tidy-evaluation)

故事扮演 提交于 2021-01-28 19:10:56
问题 I'm trying to understand tidy evaluation in R. grouped_mean <- function(data, group_var, summary_var) { group_var <- enquo(group_var) summary_var <- enquo(summary_var) data %>% group_by(!!group_var) %>% summarise(mean = mean(!!summary_var)) } I understand why and how to use it but not what actually happens I guess. var <- "test" var <- enquo(var) !!var Error in is_quosure(e2) : argument "e2" is missing, with no default This gives me an error while I expected it to work outside dplyr too. Why

Grouped non-dense rank without omitted values

♀尐吖头ヾ 提交于 2021-01-28 19:08:23
问题 I have the following data.frame: df <- data.frame(date = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3), id = c(4, 4, 2, 4, 1, 2, 3, 1, 2, 2, 1, 1)) And I want to add a new column grp which, for each date, ranks the IDs. Ties should have the same value, but there should be no omitted values. That is, if there are two values which are equally minimum, they should both get rank 1, and the next lowest values should get rank 2. The expected result would therefore look like this. Note that, as mentioned,

replace values with NA across multiple columns if a condition is met in R

a 夏天 提交于 2021-01-28 18:53:23
问题 I'm trying to replace values across values with NA across multiple columns if a condition is met. Here's a sample dataset: library(tidyverse) sample <- tibble(id = 1:6, team_score = 5:10, cent_dept_test_agg = c(1, 2, 3, 4, 5, 6), cent_dept_blue_agg = c(15:20), num_in_dept = c(1, 1, 2, 5, 100, 6)) I want the columns that contain cent_dept_.*_agg to be NA when num_in_dept is 1, so it looks like this: library(tidyverse) solution <- tibble(id = 1:6, team_score = 5:10, cent_dept_test_agg = c(NA,

Operations between groups with dplyr

你离开我真会死。 提交于 2021-01-28 18:20:34
问题 I have a data frame as follow where I would like to group the data by grp and index and use group a as a reference to perform some simple calculations. I would like to subtract the variable value from other group from the values of group a . df <- data.frame(grp = rep(letters[1:3], each = 2), index = rep(1:2, times = 3), value = seq(10, 60, length.out = 6)) df ## grp index value ## 1 a 1 10 ## 2 a 2 20 ## 3 b 1 30 ## 4 b 2 40 ## 5 c 1 50 ## 6 c 2 60 The desired outpout would be like: ## grp

Creating new variables with purrr (how does one go about that?)

蹲街弑〆低调 提交于 2021-01-28 18:19:42
问题 I have a large data set, with a bunch of columns that I want to run the same function on, based on either prefix or suffix, to create a new variable. What I would like to be able to do is provide a list to map, and create new variables. dataframe <- data_frame(x_1 = c(1,2,3,4,5,6), x_2 = c(1,1,1,2,2,2), y_1 = c(200,400,120,300,100,100), y_2 = c(250,500,150,240,140,400)) newframe <- dataframe %>% mutate(x_ratio = x_1/x_2, y_ratio = y_1/y_2) In the past, i have written code in a string

Calculate change since base year?

心不动则不痛 提交于 2021-01-28 17:37:12
问题 I have a dataset that looks something like this: df1 <- data.frame(id = c(rep("A1",4), rep("A2",4)), time = rep(c(0,2:4), 2), y1 = rnorm(8), y2 = rnorm(8)) For each of the y variables, I want to calculate their change since time==0 . Basically, I want to do this: calc_chage <- function(id, data){ #y1 y1_0 <- data$y1[which(data$time==0 & data$id==id)] D2y1 <- data$y1[which(data$time==2 & data$id==id)] - y1_0 D3y1 <- data$y1[which(data$time==3 & data$id==id)] - y1_0 D4y1 <- data$y1[which(data