tidyverse

mutate with case_when and contains

若如初见. 提交于 2019-12-05 01:37:03
问题 I feel like there should be an efficient way to mutate new columns with dplyr using case_when and contains , but cannot get it to work. I understand using case_when within mutate is "somewhat experimental" (as in this post), but would be grateful for any suggestions. Doesn't work: library(tidyverse) set.seed(1234) x <- c("Black", "Blue", "Green", "Red") df <- data.frame(a = 1:20, b = sample(x,20, replace=TRUE)) df <- df %>% mutate(group = case_when(.$b(contains("Bl")) ~ "Group1", case_when(.

bunch recoding of variables in the tidyverse (functional / meta-programing)

会有一股神秘感。 提交于 2019-12-04 19:24:54
I want to recode a bunch of variables with as few function calls as possible. I have one data.frame where I want to recode a number of variables. I create a named list of all variable names and the recoding arguments I want to execute. Here I have no problem using map and dpylr . However, when it comes to recoding I find it much easier using recode from the car package, instead of dpylr 's own recoding function. A side question is whether there is a nice way of doing the same thing with dplyr::recode . As a next step I break the data.frame down into a nested tibble. Here I want to do specific

Using R with tidyquant and massiv data

你。 提交于 2019-12-04 15:40:14
While working with R I encountered a strange problem: I am processing date in the follwing manner: Reading data from a database into a dataframe, filling missing values, grouping and nesting the data to a combined primary key, creating a timeseries and forecastting it for every group, ungroup and clean the data, write it back into the DB. Somehting like this: https://cran.rstudio.com/web/packages/sweep/vignettes/SW01_Forecasting_Time_Series_Groups.html For small data sets this works like a charm, but with lager ones (over about 100000 entries) I do get the "R Session Aborted" screen from R

use dplyr mutate() in programming

╄→尐↘猪︶ㄣ 提交于 2019-12-04 14:06:50
问题 I am trying to assign a column name to a variable using mutate. df <-data.frame(x = sample(1:100, 50), y = rnorm(50)) new <- function(name){ df%>%mutate(name = ifelse(x <50, "small", "big")) } When I run new(name = "newVar") it doesn't work. I know mutate_() could help but I'm struggling in using it together with ifelse . Any help would be appreciated. 回答1: Using dplyr 0.7.1 and its advances in NSE, you have to UQ the argument to mutate and then use := when assigning. There is lots of info on

Error casted by simple mutate using tidyverse or dplyr

三世轮回 提交于 2019-12-04 13:50:10
I am having serious troubles using the tidyverse package that I cannot debug. As an example, "mutate" does not work properly even on past project I have already produced. This all started when I installed the following package: library(pdftools) library(tm) library(stringi) library(tidyverse) (or library(dplyr) library(tidyr)) library(purrr) ) And it still remains when I do a rm(list=ls()) . The only thing I haven't tried so forth is deinstalling R/RStudio and reinstalling it. I use RStudio version 1.0.153 and R version 3.4.1. I actually tried to reproduce the bug on other computers and this

Replace NA in all columns of a dplyr chain

六月ゝ 毕业季﹏ 提交于 2019-12-04 11:29:08
The question replace NA in a dplyr chain results into the solution dt %.% group_by(a) %.% mutate(b = ifelse(is.na(b), mean(b, na.rm = T), b)) with dplyr. I want to impute all colums with dplyr chain. There is no single column to group by, rather I want all numeric columns to have all NAs replaced by the means such as column means. What is the most elegant way to replace all NAs with column means with tidyverse/dp? We can use mutate_all with ifelse dt %>% group_by(a) %>% mutate_all(funs(ifelse(is.na(.), mean(., na.rm = TRUE), .))) If we want a compact option, then use the na.aggregate from zoo

Using dplyr filter() in programming

北慕城南 提交于 2019-12-04 07:08:38
I am writing my function and want to use dplyr's filter() function to select rows of my data frame that satisfy a condition. This is my code: library(tidyverse) df <-data.frame(x = sample(1:100, 50), y = rnorm(50), z = sample(1:100,50), w = sample(1:100, 50), p = sample(1:100,50)) new <- function(ang,brad,drau){ df%>%filter(!!drau %in% 1:50)%>%select(ang,brad) -> A return(A) } brand <- c("z","w","p") lapply(1:3, function(i) new(ang = "x", brad = "y", drau = brand[i]))%>%bind_rows() Anytime I run this function, it looks like filter doesn't select any rows that satisfy the condition. How can I

Extract longest word in string

旧城冷巷雨未停 提交于 2019-12-04 05:30:39
I would like to find and extract the longest word of a string, if possible using a tidyverse package. library(tidyverse) tbl <- tibble(a=c("ab cde", "bcde f", "cde fg"), b=c("cde", "bcde", "cde")) tbl # A tibble: 3 x 1 a <chr> 1 ab cde 2 bcde f 3 cde fg The result I am looking for is: # A tibble: 3 x 2 a b <chr> <chr> 1 ab cde cde 2 bcde f bcde 3 cde fg cde The closest post to the question I have found is this: longest word in a string . Does anyone have an idea for an even simpler way? Solution using base R: # Using OPs provided data tbl$b <- sapply(strsplit(tbl$a, " "), function(x) x[which

Separate contents of field

天大地大妈咪最大 提交于 2019-12-04 02:32:22
问题 I'm sure this is very simple, and I think it's a case of using separate and gather. I have a single field in a dataframe, authorlist,an edited export of a pubmed search. It contains the authors of the publications. It can, obviously, contain either a single author or a collaboration of authors. For example this is just a selection of the options available: Author Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P. What I'd like to do is create a single list of ALL authors so that I'd

Using nested function with lapply

怎甘沉沦 提交于 2019-12-04 02:18:04
问题 This code works (takes hours minutes and seconds and converts to seconds only): library(lubridate) library(tidyverse) original_date_time<-"2018-01-3111:59:59" period_to_seconds(hms(paste(hour(original_date_time), minute(original_date_time),second(original_date_time), sep = ":"))) I have this tibble: df<-data.frame("id"=c(1,2,3,4,5), "Time"=c("1999-12-31 10:10:10","1999-12-31 09:05:13","1999-12-31 00:05:25","1999-12-31 07:04","1999-12-31 03:05:07")) tib<-as_tibble(df) tib result: # A tibble: 5