tidyverse | 易学教程

mutate with case_when and contains

阅读更多关于 mutate with case_when and contains

问题 I feel like there should be an efficient way to mutate new columns with dplyr using case_when and contains , but cannot get it to work. I understand using case_when within mutate is "somewhat experimental" (as in this post), but would be grateful for any suggestions. Doesn't work: library(tidyverse) set.seed(1234) x <- c("Black", "Blue", "Green", "Red") df <- data.frame(a = 1:20, b = sample(x,20, replace=TRUE)) df <- df %>% mutate(group = case_when(.$b(contains("Bl")) ~ "Group1", case_when(.

bunch recoding of variables in the tidyverse (functional / meta-programing)

阅读更多关于 bunch recoding of variables in the tidyverse (functional / meta-programing)

I want to recode a bunch of variables with as few function calls as possible. I have one data.frame where I want to recode a number of variables. I create a named list of all variable names and the recoding arguments I want to execute. Here I have no problem using map and dpylr . However, when it comes to recoding I find it much easier using recode from the car package, instead of dpylr 's own recoding function. A side question is whether there is a nice way of doing the same thing with dplyr::recode . As a next step I break the data.frame down into a nested tibble. Here I want to do specific

Using R with tidyquant and massiv data

阅读更多关于 Using R with tidyquant and massiv data

While working with R I encountered a strange problem: I am processing date in the follwing manner: Reading data from a database into a dataframe, filling missing values, grouping and nesting the data to a combined primary key, creating a timeseries and forecastting it for every group, ungroup and clean the data, write it back into the DB. Somehting like this: https://cran.rstudio.com/web/packages/sweep/vignettes/SW01_Forecasting_Time_Series_Groups.html For small data sets this works like a charm, but with lager ones (over about 100000 entries) I do get the "R Session Aborted" screen from R

use dplyr mutate() in programming

阅读更多关于 use dplyr mutate() in programming

问题 I am trying to assign a column name to a variable using mutate. df <-data.frame(x = sample(1:100, 50), y = rnorm(50)) new <- function(name){ df%>%mutate(name = ifelse(x <50, "small", "big")) } When I run new(name = "newVar") it doesn't work. I know mutate_() could help but I'm struggling in using it together with ifelse . Any help would be appreciated. 回答1: Using dplyr 0.7.1 and its advances in NSE, you have to UQ the argument to mutate and then use := when assigning. There is lots of info on

Error casted by simple mutate using tidyverse or dplyr

阅读更多关于 Error casted by simple mutate using tidyverse or dplyr

I am having serious troubles using the tidyverse package that I cannot debug. As an example, "mutate" does not work properly even on past project I have already produced. This all started when I installed the following package: library(pdftools) library(tm) library(stringi) library(tidyverse) (or library(dplyr) library(tidyr)) library(purrr) ) And it still remains when I do a rm(list=ls()) . The only thing I haven't tried so forth is deinstalling R/RStudio and reinstalling it. I use RStudio version 1.0.153 and R version 3.4.1. I actually tried to reproduce the bug on other computers and this

Replace NA in all columns of a dplyr chain

阅读更多关于 Replace NA in all columns of a dplyr chain

The question replace NA in a dplyr chain results into the solution dt %.% group_by(a) %.% mutate(b = ifelse(is.na(b), mean(b, na.rm = T), b)) with dplyr. I want to impute all colums with dplyr chain. There is no single column to group by, rather I want all numeric columns to have all NAs replaced by the means such as column means. What is the most elegant way to replace all NAs with column means with tidyverse/dp? We can use mutate_all with ifelse dt %>% group_by(a) %>% mutate_all(funs(ifelse(is.na(.), mean(., na.rm = TRUE), .))) If we want a compact option, then use the na.aggregate from zoo

Using dplyr filter() in programming

阅读更多关于 Using dplyr filter() in programming

I am writing my function and want to use dplyr's filter() function to select rows of my data frame that satisfy a condition. This is my code: library(tidyverse) df <-data.frame(x = sample(1:100, 50), y = rnorm(50), z = sample(1:100,50), w = sample(1:100, 50), p = sample(1:100,50)) new <- function(ang,brad,drau){ df%>%filter(!!drau %in% 1:50)%>%select(ang,brad) -> A return(A) } brand <- c("z","w","p") lapply(1:3, function(i) new(ang = "x", brad = "y", drau = brand[i]))%>%bind_rows() Anytime I run this function, it looks like filter doesn't select any rows that satisfy the condition. How can I

Extract longest word in string

阅读更多关于 Extract longest word in string

I would like to find and extract the longest word of a string, if possible using a tidyverse package. library(tidyverse) tbl <- tibble(a=c("ab cde", "bcde f", "cde fg"), b=c("cde", "bcde", "cde")) tbl # A tibble: 3 x 1 a <chr> 1 ab cde 2 bcde f 3 cde fg The result I am looking for is: # A tibble: 3 x 2 a b <chr> <chr> 1 ab cde cde 2 bcde f bcde 3 cde fg cde The closest post to the question I have found is this: longest word in a string . Does anyone have an idea for an even simpler way? Solution using base R: # Using OPs provided data tbl$b <- sapply(strsplit(tbl$a, " "), function(x) x[which

Separate contents of field

阅读更多关于 Separate contents of field

问题 I'm sure this is very simple, and I think it's a case of using separate and gather. I have a single field in a dataframe, authorlist,an edited export of a pubmed search. It contains the authors of the publications. It can, obviously, contain either a single author or a collaboration of authors. For example this is just a selection of the options available: Author Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P. What I'd like to do is create a single list of ALL authors so that I'd

Using nested function with lapply

阅读更多关于 Using nested function with lapply

问题 This code works (takes hours minutes and seconds and converts to seconds only): library(lubridate) library(tidyverse) original_date_time<-"2018-01-3111:59:59" period_to_seconds(hms(paste(hour(original_date_time), minute(original_date_time),second(original_date_time), sep = ":"))) I have this tibble: df<-data.frame("id"=c(1,2,3,4,5), "Time"=c("1999-12-31 10:10:10","1999-12-31 09:05:13","1999-12-31 00:05:25","1999-12-31 07:04","1999-12-31 03:05:07")) tib<-as_tibble(df) tib result: # A tibble: 5