tidyverse

dplyr - sum of multiple columns using regular expressions

痞子三分冷 提交于 2019-12-08 04:19:53
问题 For the dataset mtcars2 mtcars2 = mtcars mtcars2 = mtcars2 %>% mutate(cyl9=cyl, disp9=disp, gear2=gear) I want to get a new column which is the sum of multiple columns, by using regular expressions to capture the pattern. This is a solution, however this is done by hard-coding select(mtcars2, cyl9) + select(mtcars2, disp9) + select(mtcars2, gear2) I tried something like this but it gives me a number instead of a vector mtcars2 %>% select(matches("[0-9]")) %>% sum Please dplyr solutions only,

Calculate the mean between several columns of df2 that can vary according to the variable `var1` of df1 and add the value to a new variable in df1

拜拜、爱过 提交于 2019-12-08 03:56:37
问题 I have a data frame df1 that summarises the depth of different fishes over time at different places. On the other hand, I have df2 that summarises the intensity of the currents over time (EVERY THREE HOURS) from the surface to 39 meters depth at intervals of 8 meters ( m0-7 , m8-15 , m16-23 , m24-31 and m32-39 ) in a specific place. As an example: df1<-data.frame(Datetime=c("2016-08-01 15:34:07","2016-08-01 16:25:16","2016-08-01 17:29:16","2016-08-01 18:33:16","2016-08-01 20:54:16","2016-08

How to fix 'Quosures can only be unquoted within a quasiquotation context' error in R function

狂风中的少年 提交于 2019-12-08 02:58:11
问题 I am trying to write my first function using rlang and I am having some trouble fixing the following error. I've read the vignette, but didn't see a good example of what I'm trying to do. library(babynames) library(tidyverse) name_graph <- function(data, name, sex){ name <- enquo(name) sex <- enquo(sex) data %>% filter_(name == !!name, sex == !!sex) %>% select(year, prop) %>% ggplot()+ geom_line(mapping = aes(year, prop)) } name_graph(babynames, Robert, M) I'm expecting my distribution graph,

Left joining in R between two timestamps

戏子无情 提交于 2019-12-08 02:17:09
问题 My goal is to perform a left join on intervals where the bike_id matches and the created_at timestamp in records is BETWEEN start and end in the intervals table > class(records) [1] "data.table" "data.frame" > class(intervals) [1] "data.table" "data.frame" > records bike_id created_at resolved_at 1 28780 2019-05-03 08:29:18 2019-05-03 08:35:37 2 28780 2019-05-03 21:05:28 2019-05-03 21:07:28 3 28780 2019-05-04 21:13:39 2019-05-04 21:15:40 4 28780 2019-05-07 17:24:20 2019-05-07 17:26:39 5 28780

Passing column names through multiple functions with dplyr

夙愿已清 提交于 2019-12-08 00:08:35
问题 I wrote a simple function to create tables of percentages in dplyr : library(dplyr) df = tibble( Gender = sample(c("Male", "Female"), 100, replace = TRUE), FavColour = sample(c("Red", "Blue"), 100, replace = TRUE) ) quick_pct_tab = function(df, col) { col_quo = enquo(col) df %>% count(!! col_quo) %>% mutate(Percent = (100 * n / sum(n))) } df %>% quick_pct_tab(FavColour) # Output: # A tibble: 2 x 3 FavColour n Percent <chr> <int> <dbl> 1 Blue 58 58 2 Red 42 42 This works great. However, when I

mutate_at evaluation error when using group_by

荒凉一梦 提交于 2019-12-07 17:53:48
问题 mutate_at() shows an evaluation error when used with group_by() and when imputing a numerical vector for column position as the first (.vars) argument. Issue shows up when using R 3.4.2 and dplyr 0.7.4 version Works fine when using R 3.3.2 and dplyr 0.5.0 Works fine if .vars is character vector (column name) Example: # Create example dataframe Id <- c('10_1', '10_2', '11_1', '11_2', '11_3', '12_1') Month <- c(2, 3, 4, 6, 7, 8) RWA <- c(0, 0, 0, 1.579, NA, 0.379) dftest = data.frame(Id, Month,

Non-standard eval in dplyr::mutate

a 夏天 提交于 2019-12-07 16:32:59
问题 In theory this should work, as I've read the tidyverse guide on NSE, but it throws me an error as seen in the bottom of this example. Why is this? I understand how to do a simple quasiquotation of an object, but I do not understand how to evaluate a fraction of two quasiquoted objects. Can anyone help with this? tmp <- structure(list(qa11a = structure(c(1616, 7293, 1528, 1219, 2049, 286), label = "Total voters removed from Nov. 2008 to Nov. 2010", class = c("labelled","numeric")), state_abbv

Separate string after last underscore

给你一囗甜甜゛ 提交于 2019-12-07 15:28:57
问题 This is indeed a duplicate for this question r-split-string-using-tidyrseparate, but I cannot use the MWE for my purpose, because I do not know how to adjust the regular Expression. I basically want the same thing, but split the variable after the last underscore. Reason: I have data where some columns show up several times for the same factor/type. I figured I can melt the data separate the value variable before the type string and spread it out again to a wide format with less columns. My

Struggling to Create a Pivot Table in R

耗尽温柔 提交于 2019-12-07 14:54:33
问题 I am very, very new to any type of coding language. I am used to Pivot tables in Excel, and trying to replicate a pivot I have done in Excel in R. I have spent a long time searching the internet/ YouTube, but I just can't get it to work. I am looking to produce a table in which I the left hand side column shows a number of locations, and across the top of the table it shows different pages that have been viewed. I want to show in the table the number of views per location which each of these

Match character vector in a dataframe with another character vector and trim character

不羁的心 提交于 2019-12-07 12:42:22
问题 Here is a dataframe and a vector. df1 <- tibble(var1 = c("abcd", "efgh", "ijkl", "qrst")) vec <- c("abcd", "mnop", "ijkl") Now, for all the values in var1 that matches with the values in vec, keep only first 3 characters in var1 such that the desired solution is: df2 <- tibble(var1 = c("abc", "efgh", "ijk", "qrst")) Since, "abcd" matches, we keep only 3 characters i.e. "abc" in df2, but "efgh" doesn't exist in vec, so we keep it as is i.e "efgh" in df2. How can I use dplyr and/or stringr to