dplyr

Aggregating strings using tostring and counting them in r

会有一股神秘感。 提交于 2021-01-27 11:51:44
问题 I have following dataframe got after applying dplyr code Final_df<- df %>% group_by(clientID,month) %>% summarise(test=toString(Sector)) %>% as.data.frame() Which gives me following output ClientID month test ASD Sep Auto,Auto,Finance DFG Oct Finance,Auto,Oil How I want is to count sectors as well ClientID month test ASD Sep Auto:2,Finance:1 DFG Oct Finance:1,Auto:1,Oil:1 How can I achieve it with dplyr? 回答1: We can try df %>% group_by(client_id, month, Sector) %>% tally() %>% group_by(client

Parse and Evaluate Column of String Expressions in R?

丶灬走出姿态 提交于 2021-01-27 11:39:58
问题 How can I parse and evaluate a column of string expressions in R as part of a pipeline? In the example below, I produce my desired column, evaluated . But I know this isn't the right approach. I tried taking a tidyverse approach. But I'm just very confused. library(tidyverse) df <- tibble(name = LETTERS[1:3], to_evaluate = c("1-1+1", "iter+iter", "4*iter-1"), evaluated = NA) iter = 1 for (i in 1:nrow(df)) { df[i,"evaluated"] <- eval(parse(text=df$to_evaluate[[i]])) } print(df) # # A tibble: 3

dplyr case_when with dynamic number of cases

蹲街弑〆低调 提交于 2021-01-27 07:23:41
问题 Wanting to use dplyr and case_when to collapse a series of indicator columns into a single column. The challenge is I want to be able to collapse over an unspecified/dynamic number of columns. Consider the following dataset, gear has been split into a series of indicator columns. library(dplyr) data(mtcars) mtcars = mtcars %>% mutate(g2 = ifelse(gear == 2, 1, 0), g3 = ifelse(gear == 3, 1, 0), g4 = ifelse(gear == 4, 1, 0)) %>% select(g2, g3, g4) I am trying to write a function that does the

dplyr case_when with dynamic number of cases

五迷三道 提交于 2021-01-27 07:23:04
问题 Wanting to use dplyr and case_when to collapse a series of indicator columns into a single column. The challenge is I want to be able to collapse over an unspecified/dynamic number of columns. Consider the following dataset, gear has been split into a series of indicator columns. library(dplyr) data(mtcars) mtcars = mtcars %>% mutate(g2 = ifelse(gear == 2, 1, 0), g3 = ifelse(gear == 3, 1, 0), g4 = ifelse(gear == 4, 1, 0)) %>% select(g2, g3, g4) I am trying to write a function that does the

Replace entire string anywhere in dataframe based on partial match with dplyr

自古美人都是妖i 提交于 2021-01-27 06:37:24
问题 I'm struggling to find the right dplyr code to use grepl or an equivalent to replace values throughout an entire data frame. i.e.: any cell that contains 'mazda' in it, should have it's entire content replaced with the new string 'A car' after lots of searching online, the closest I came was: The emphasis being on applying it to ALL columns. library(dplyr) mtcars$carnames <- rownames(mtcars) # dummy data to test on This line does the trick for entire sting being an exact match: mtcars %>%

use of other columns as arguments to function in summarize_at()

瘦欲@ 提交于 2021-01-27 06:31:44
问题 This works great: > mtcars %>% group_by(cyl) %>% summarize_at(vars(disp, hp), weighted.mean) # A tibble: 3 x 3 cyl disp hp <dbl> <dbl> <dbl> 1 4.00 105 82.6 2 6.00 183 122 3 8.00 353 209 But now I want to use one of the columns from mtcars as the w argument to weighted.mean. Sadly, the obvious attempt fails: > mtcars %>% group_by(cyl) %>% summarize_at(vars(disp, hp), weighted.mean, w = wt) Error in dots_list(...) : object 'wt' not found Even though wt is, indeed, part of mtcars. How can I use

How to group rows in a range and consider a 3rd column?

我们两清 提交于 2021-01-27 06:22:30
问题 I have a genetic dataset where I want to group genetic variants/rows that are physically close together in the genome. I want to group genes that are within ranges from certain spots in the genome per chromosome ( chrom ). My 'spots' dataset is of positions that variants/rows need to be within a range of and looks like: chrom low high 1 500 1700 1 19500 20600 5 400 1500 My low and high columns are the ranges that I want to see if any rows in my next dataset fall into, with also accounting

How to group rows in a range and consider a 3rd column?

前提是你 提交于 2021-01-27 06:21:10
问题 I have a genetic dataset where I want to group genetic variants/rows that are physically close together in the genome. I want to group genes that are within ranges from certain spots in the genome per chromosome ( chrom ). My 'spots' dataset is of positions that variants/rows need to be within a range of and looks like: chrom low high 1 500 1700 1 19500 20600 5 400 1500 My low and high columns are the ranges that I want to see if any rows in my next dataset fall into, with also accounting

summarize to vector output

一世执手 提交于 2021-01-27 06:18:06
问题 Let's say I have the following (simplified) tibble containing a group and values in vectors: set.seed(1) (tb_vec <- tibble(group = factor(rep(c("A","B"), c(2,3))), values = replicate(5, sample(3), simplify = FALSE))) # A tibble: 5 x 2 group values <fct> <list> 1 A <int [3]> 2 A <int [3]> 3 B <int [3]> 4 B <int [3]> 5 B <int [3]> tb_vec[[1,2]] [1] 1 3 2 I would like to summarize the values vectors per group by summing them (vectorized) and tried the following: tb_vec %>% group_by(group) %>%

Using dplyr to filter rows which contain partial string of column

余生长醉 提交于 2021-01-27 06:06:19
问题 Assuming I have a data frame like term cnt apple 10 apples 5 a apple on 3 blue pears 3 pears 1 How could I filter all partial found strings within this column, e.g. getting as a result term cnt apple 10 pears 1 without indicating to which terms I want to filter (apple|pears), but through a self-referencing manner (i.e. it does check each term against the whole column and removes terms that are a partial match). The number of tokens is not limited, nor the consistency of strings (i.e. "mapples