dplyr

applying a function to combinations of groups, holding 1 group fixed

偶尔善良 提交于 2021-01-29 10:46:43
问题 I have some data which looks like: grp date id Y <chr> <dttm> <chr> <dbl> 1 group1 2020-09-01 00:00:00 04003 17039. 2 group1 2020-09-01 00:00:00 04006 13233. 3 group1 2020-09-01 00:00:00 04011_AM 7918. 4 group1 2020-09-01 00:00:00 0401301_AD 22586. 5 group1 2020-09-01 00:00:00 0401303 20527. 6 group1 2020-09-01 00:00:00 0401305 29422. 7 group2 2020-09-01 00:00:00 22017_AM 7088. 8 group2 2020-09-01 00:00:00 22021_AM 8134. 9 group2 2020-09-01 00:00:00 22039_AM 15842. 10 group2 2020-09-01 00:00

Wrapping dplyr filter in function results in “Error: Result must have length 4803, not 3”

心不动则不痛 提交于 2021-01-29 10:44:26
问题 I'm learning R for data analysis and using this Kaggle dataset. Following the movie recommendation script works, but when I try to generalize a dplyr code by making it a function I get an error: I've tried troubleshooting some. It looks like the code stops at the filter and mutate functions. The following works and gives the expected output. genres <- df %>% filter(nchar(genres)>2) %>% mutate( separated = lapply(genres, fromJSON) ) %>% unnest(separated, .name_repair = "unique") %>% select(id,

'mutate' to add two columns with a single fn-call in tidyverse in R

Deadly 提交于 2021-01-29 09:21:37
问题 This is an R Version 3.4.4 question A voting function voteOnBase , takes 2 arguments and returns a 2-element list: the WINNER and the VOTE.COUNT . I want to use it to add those two columns to notVotedYet , a tibble. The following code runs correctly. library(tidyverse) withVotes <- notVotedYet %>% group_by(BASE) %>% mutate(WINNER = voteOnBase(BASE, CODES)[[1]], VOTE.COUNT = voteOnBase(BASE, CODES)[[2]]) However, it calls voteOnBase twice on the same inputs. How can I eliminate the extra

reactivity in timevis package: passing selectinput variable to subgroup

我与影子孤独终老i 提交于 2021-01-29 09:02:25
问题 Using the timevis package (dean attali) in R I would like to plot the timeline by group individually with a selectinput widget in r shiny: Error in : Can't subset columns that don't exist. x Column 2 doesn't exist. Can someone help? Thank you My code: library(shiny) library(timevis) library(dplyr) # constructing data frame pre_data <- data.frame( group = as.integer(c(1,1,2,2)), content = c("Item one", "Item two", "Ranged item", "Ranged item two"), start = c("2016-01-10", "2016-01-11", "2016

R: removing half of the confidence bands

|▌冷眼眸甩不掉的悲伤 提交于 2021-01-29 08:56:51
问题 I am using the R programming language. I create the following data and graph: library(xts) library(ggplot2) library(dplyr) library(plotly) library(lubridate) set.seed(123) #time series 1 date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day") property_damages_in_dollars <- rnorm(731,100,10) final_data <- data.frame(date_decision_made, property_damages_in_dollars) #####aggregate final_data$year_month <- format(as.Date(final_data$date_decision_made), "%Y-%m") final_data$year

Group (collapse?) rows without summarising to fill in NAs

和自甴很熟 提交于 2021-01-29 08:07:52
问题 I have an issue that should be straightforward with dplyr (I think) but I can't seem to find a resolution. My dataframe comprises numbers and factors. Each observation is represented by two rows which have either a value or NA in one of two columns (Agg_Entropy and Av_Amplitude). I want to combine each observation's rows into a single row (without summarising), so that the NAs are replaced with the relevant values. A simple excerpt of the dataframe: Selection Low High Agg_Entropy Av_Amplitude

R ggplot2: geom_area get linetype by group

我们两清 提交于 2021-01-29 07:05:29
问题 I am trying to differentiate the linetype and/or color in stacked geom_area by group. How can I do that? Simply geom_area(linetype = type) or color = type does not work. Only think that works changes the values for both groups: geom_area(color = "white") . How can I modify the color and linetype by group? My dummy example: dat <- data.frame(x = c(1:5,1:5), y = c(9:5, 10,7,5,3,1), type = rep(c("a", "b"), each = 5)) My geom_area : library(dplyr) dat %>% ggplot(aes(fill = type, x = x, y = y)) +

merge .csvs based on common column but of inconsistent length

╄→尐↘猪︶ㄣ 提交于 2021-01-29 06:47:17
问题 Afternoon (or morning, evening) I am trying to merge several .csv files that have a similar layout, they have a class in one column ( character ) and an abundance ( num ) in another. When imported as a data.frame example would be: print(one[1:5,]) X Class Abundance_inds 1 1 Chaetognath 2 2 2 Copepod_Calanoid_Acartia_spp 9 3 3 Copepod_Calanoid_Centropages_spp 4 4 4 Copepod_Calanoid_Temora_spp 1 5 5 Copepod_Calanoid_Unknown 55 The class column ( number of rows and order ) changes every csv

how to print a list-column of ggplots to pdf?

跟風遠走 提交于 2021-01-29 06:47:01
问题 Consider this funny example mydata <- data_frame(group = c('a', 'a', 'a', 'b', 'b', 'b'), x = c(1,2,3,5,6,7), y = c(3,5,6,4,3,2)) > mydata # A tibble: 6 x 3 group x y <chr> <dbl> <dbl> 1 a 1 3 2 a 2 5 3 a 3 6 4 b 5 4 5 b 6 3 6 b 7 2 Here I can nest() by group, and store a group-based ggplot into a list-column . Crazy stuff. > mydata %>% group_by(group) %>% + nest() %>% + mutate(myplot = map(data, ~ggplot(data = .x, aes(x = x, y = x)) + geom_point())) # A tibble: 2 x 3 group data myplot <chr>

R dplyr: Conditional Mutate based on Groups

落爺英雄遲暮 提交于 2021-01-29 06:21:05
问题 Currently, I am working on the following problem: I am trying to split my dataset in groups and create a new variable that captures the group mean of all opposite cases that do not belong to this group - for a specific time frame. Here is a replica of my code using the mpg dataset. cars <- mpg cars$other_cty_yearly_mean <- 0 for(i in cars$cyl){ cars <- cars %>% group_by(year) %>% mutate(other_cty_yearly_mean = if_else( cyl == i, mean(cty[cyl != i]), other_cty_yearly_mean )) %>% ungroup() %>%