tidyverse | 易学教程

Joining list of data.frames from map() call

阅读更多关于 Joining list of data.frames from map() call

问题 Is there a "tidyverse" way to join a list of data.frames (a la full_join() , but for >2 data.frames)? I have a list of data.frames as a result of a call to map() . I've used Reduce() to do something like this before, but would like to merge them as part of a pipeline - just haven't found an elegant way to do that. Toy example: library(tidyverse) ## Function to make a data.frame with an ID column and a random variable column with mean = df_mean make.df <- function(df_mean){ data.frame(id = 1

Computation failed in `stat_smooth()`: object 'C_crspl' not found

阅读更多关于 Computation failed in `stat_smooth()`: object 'C_crspl' not found

问题 I am trying to add a geom_smooth() to a qplot() with the following code: library(ggplot2) library(ggplot2movies) qplot(votes, rating, data = movies) + geom_smooth() However, the smoother is missing from the plot. I also receive the following warning message: Computation failed in stat_smooth() : object 'C_crspl' not found Does anybody know what is wrong here? This is my setup: > sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.1 LTS

Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

阅读更多关于 Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

问题 My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because they conflict after grouping. Label <- c("203c","203c","204a","204a","204a","204a","204a","204a","204a","204a") Type <- c("wholefish","flesh","flesh","fleshdelip","formula","formuladelip", "formula","formuladelip","wholefish", "wholefishdelip") Proportion <- c(1,1,0.67714,0.67714,0.32285,0.32285,0.32285, 0.32285, 0.67714,0.67714

Opposite of unnest_tokens

阅读更多关于 Opposite of unnest_tokens

问题 This is most likely a stupid question, but I've googled and googled and can't find a solution. I think it's because I don't know the right way to word my question to search. I have a data frame that I have converted to tidy text format in R to get rid of stop words. I would now like to 'untidy' that data frame back to its original format. What's the opposite / inverse command of unnest_tokens? Edit: here is what the data I'm working with look like. I'm trying to replicate analyses from Silge

Calculating Occupancy in hospital from dates with time.

阅读更多关于 Calculating Occupancy in hospital from dates with time.

I am looking to calculate occupancy in emergency department (ED) with tidyverse. Occupancy is understood here in this particular problem as Admitted but did not leave the hospital within the same hour they were admitted. A clearer example would be: if I came at ED at 12:00:00 and did not leave within the hour I was admitted, then I am occupying the bospital. So for this I need to create a new column Occupancy. (A little insight to give - I want to plot occupancy by hour of the day. Yet I know how to plot this, but do not know how to calculate occupancy. Thus no need for you to be bogged down

dplyr - sum of multiple columns using regular expressions

阅读更多关于 dplyr - sum of multiple columns using regular expressions

For the dataset mtcars2 mtcars2 = mtcars mtcars2 = mtcars2 %>% mutate(cyl9=cyl, disp9=disp, gear2=gear) I want to get a new column which is the sum of multiple columns, by using regular expressions to capture the pattern. This is a solution, however this is done by hard-coding select(mtcars2, cyl9) + select(mtcars2, disp9) + select(mtcars2, gear2) I tried something like this but it gives me a number instead of a vector mtcars2 %>% select(matches("[0-9]")) %>% sum Please dplyr solutions only, since i need to apply these functions to a sql table later on. Thanks! Update.. I need the solution to

Calculate the mean between several columns of df2 that can vary according to the variable `var1` of df1 and add the value to a new variable in df1

阅读更多关于 Calculate the mean between several columns of df2 that can vary according to the variable `var1` of df1 and add the value to a new variable in df1

I have a data frame df1 that summarises the depth of different fishes over time at different places. On the other hand, I have df2 that summarises the intensity of the currents over time (EVERY THREE HOURS) from the surface to 39 meters depth at intervals of 8 meters ( m0-7 , m8-15 , m16-23 , m24-31 and m32-39 ) in a specific place. As an example: df1<-data.frame(Datetime=c("2016-08-01 15:34:07","2016-08-01 16:25:16","2016-08-01 17:29:16","2016-08-01 18:33:16","2016-08-01 20:54:16","2016-08-01 22:48:16"),Site=c("BD","HG","BD","BD","BD","BD"),Ind=c(16,17,19,16,17,16), Depth=c(5.3,24,36.4,42,NA

How to conditionally mutate multiple columns using “contains” and “ifelse”?

阅读更多关于 How to conditionally mutate multiple columns using “contains” and “ifelse”?

I want to mutate multiple columns containing the string "account". Specifically, I want these columns to take "NA" when a certain condition is met, and another value when the condition is not met. Below I present my attempt inspired on here and here . So far, unsuccessful. Still trying, nevertheless any help would be much appreciated. My data df<-as.data.frame(structure(list(low_account = c(1, 1, 0.5, 0.5, 0.5, 0.5), high_account = c(16, 16, 56, 56, 56, 56), mid_account_0 = c(8.5, 8.5, 28.25, 28.25, 28.25, 28.25), mean_account_0 = c(31.174, 30.1922101449275, 30.1922101449275, 33.3055555555556,

Create a variable in `df1` depending on one variable of `df1` (`df1$var1`) and one variable of `df2` that is changeable depending on `df1$var1`

阅读更多关于 Create a variable in `df1` depending on one variable of `df1` (`df1$var1`) and one variable of `df2` that is changeable depending on `df1$var1`

I have data frame df1 that summarises fish depths over time. df1$Site tells you the site where the fish was, df1$Ind tells you the individual and df1$Depth tells you the depth where the fish was at a specific df1$Datetime . On the other hand, I have df2 that summarises the intensity of the currents over time (EVERY THREE HOURS) from the surface to 39 meters depth at intervals of 8 meters ( m0-7 , m8-15 , m16-23 , m24-31 and m32-39 ). As an example: df1<-data.frame(Datetime=c("2016-08-01 15:34:07","2016-08-01 16:25:16","2016-08-01 17:29:16","2016-08-01 18:33:16","2016-08-01 20:54:16","2016-08

Passing column names through multiple functions with dplyr

阅读更多关于 Passing column names through multiple functions with dplyr

I wrote a simple function to create tables of percentages in dplyr : library(dplyr) df = tibble( Gender = sample(c("Male", "Female"), 100, replace = TRUE), FavColour = sample(c("Red", "Blue"), 100, replace = TRUE) ) quick_pct_tab = function(df, col) { col_quo = enquo(col) df %>% count(!! col_quo) %>% mutate(Percent = (100 * n / sum(n))) } df %>% quick_pct_tab(FavColour) # Output: # A tibble: 2 x 3 FavColour n Percent <chr> <int> <dbl> 1 Blue 58 58 2 Red 42 42 This works great. However, when I tried to build on top of this, writing a new function that calculated the same percentages with