dplyr | 易学教程

Aggregating strings using tostring and counting them in r

阅读更多关于 Aggregating strings using tostring and counting them in r

问题 I have following dataframe got after applying dplyr code Final_df<- df %>% group_by(clientID,month) %>% summarise(test=toString(Sector)) %>% as.data.frame() Which gives me following output ClientID month test ASD Sep Auto,Auto,Finance DFG Oct Finance,Auto,Oil How I want is to count sectors as well ClientID month test ASD Sep Auto:2,Finance:1 DFG Oct Finance:1,Auto:1,Oil:1 How can I achieve it with dplyr? 回答1: We can try df %>% group_by(client_id, month, Sector) %>% tally() %>% group_by(client

Parse and Evaluate Column of String Expressions in R?

阅读更多关于 Parse and Evaluate Column of String Expressions in R?

问题 How can I parse and evaluate a column of string expressions in R as part of a pipeline? In the example below, I produce my desired column, evaluated . But I know this isn't the right approach. I tried taking a tidyverse approach. But I'm just very confused. library(tidyverse) df <- tibble(name = LETTERS[1:3], to_evaluate = c("1-1+1", "iter+iter", "4*iter-1"), evaluated = NA) iter = 1 for (i in 1:nrow(df)) { df[i,"evaluated"] <- eval(parse(text=df$to_evaluate[[i]])) } print(df) # # A tibble: 3

dplyr case_when with dynamic number of cases

阅读更多关于 dplyr case_when with dynamic number of cases

问题 Wanting to use dplyr and case_when to collapse a series of indicator columns into a single column. The challenge is I want to be able to collapse over an unspecified/dynamic number of columns. Consider the following dataset, gear has been split into a series of indicator columns. library(dplyr) data(mtcars) mtcars = mtcars %>% mutate(g2 = ifelse(gear == 2, 1, 0), g3 = ifelse(gear == 3, 1, 0), g4 = ifelse(gear == 4, 1, 0)) %>% select(g2, g3, g4) I am trying to write a function that does the

dplyr case_when with dynamic number of cases

阅读更多关于 dplyr case_when with dynamic number of cases

Replace entire string anywhere in dataframe based on partial match with dplyr

阅读更多关于 Replace entire string anywhere in dataframe based on partial match with dplyr

问题 I'm struggling to find the right dplyr code to use grepl or an equivalent to replace values throughout an entire data frame. i.e.: any cell that contains 'mazda' in it, should have it's entire content replaced with the new string 'A car' after lots of searching online, the closest I came was: The emphasis being on applying it to ALL columns. library(dplyr) mtcars$carnames <- rownames(mtcars) # dummy data to test on This line does the trick for entire sting being an exact match: mtcars %>%

use of other columns as arguments to function in summarize_at()

阅读更多关于 use of other columns as arguments to function in summarize_at()

问题 This works great: > mtcars %>% group_by(cyl) %>% summarize_at(vars(disp, hp), weighted.mean) # A tibble: 3 x 3 cyl disp hp <dbl> <dbl> <dbl> 1 4.00 105 82.6 2 6.00 183 122 3 8.00 353 209 But now I want to use one of the columns from mtcars as the w argument to weighted.mean. Sadly, the obvious attempt fails: > mtcars %>% group_by(cyl) %>% summarize_at(vars(disp, hp), weighted.mean, w = wt) Error in dots_list(...) : object 'wt' not found Even though wt is, indeed, part of mtcars. How can I use

How to group rows in a range and consider a 3rd column?

阅读更多关于 How to group rows in a range and consider a 3rd column?

问题 I have a genetic dataset where I want to group genetic variants/rows that are physically close together in the genome. I want to group genes that are within ranges from certain spots in the genome per chromosome ( chrom ). My 'spots' dataset is of positions that variants/rows need to be within a range of and looks like: chrom low high 1 500 1700 1 19500 20600 5 400 1500 My low and high columns are the ranges that I want to see if any rows in my next dataset fall into, with also accounting

How to group rows in a range and consider a 3rd column?

阅读更多关于 How to group rows in a range and consider a 3rd column?

summarize to vector output

阅读更多关于 summarize to vector output

问题 Let's say I have the following (simplified) tibble containing a group and values in vectors: set.seed(1) (tb_vec <- tibble(group = factor(rep(c("A","B"), c(2,3))), values = replicate(5, sample(3), simplify = FALSE))) # A tibble: 5 x 2 group values <fct> <list> 1 A <int [3]> 2 A <int [3]> 3 B <int [3]> 4 B <int [3]> 5 B <int [3]> tb_vec[[1,2]] [1] 1 3 2 I would like to summarize the values vectors per group by summing them (vectorized) and tried the following: tb_vec %>% group_by(group) %>%

Using dplyr to filter rows which contain partial string of column

阅读更多关于 Using dplyr to filter rows which contain partial string of column

问题 Assuming I have a data frame like term cnt apple 10 apples 5 a apple on 3 blue pears 3 pears 1 How could I filter all partial found strings within this column, e.g. getting as a result term cnt apple 10 pears 1 without indicating to which terms I want to filter (apple|pears), but through a self-referencing manner (i.e. it does check each term against the whole column and removes terms that are a partial match). The number of tokens is not limited, nor the consistency of strings (i.e. "mapples