Subset/filter in dplyr chain with ggplot2

不羁岁月 提交于 2019-12-22 08:34:36

问题


I'd like to make a slopegraph, along the lines (no pun intended) of this. Ideally, I'd like to do it all in a dplyr-style chain, but I hit a snag when I try to subset the data to add specific geom_text labels. Here's a toy example:

# make tbl:

df <- tibble(
  area = rep(c("Health", "Education"), 6),
  sub_area = rep(c("Staff", "Projects", "Activities"), 4),
  year = c(rep(2016, 6), rep(2017, 6)),
  value = rep(c(15000, 12000, 18000), 4)
) %>% arrange(area)


# plot: 

df %>% filter(area == "Health") %>% 
  ggplot() + 
  geom_line(aes(x = as.factor(year), y = value, 
            group = sub_area, color = sub_area), size = 2) + 
  geom_point(aes(x = as.factor(year), y = value, 
            group = sub_area, color = sub_area), size = 2) +
  theme_minimal(base_size = 18) + 
  geom_text(data = dplyr::filter(., year == 2016 & sub_area == "Activities"), 
  aes(x = as.factor(year), y = value, 
  color = sub_area, label = area), size = 6, hjust = 1)

But this gives me Error in filter_(.data, .dots = lazyeval::lazy_dots(...)) : object '.' not found. Using subset instead of dplyr::filter gives me a similar error. What I've found on SO/Google is this question, which addresses a slightly different problem.

What is the correct way to subset the data in a chain like this?

Edit: My reprex is a simplified example, in the real work I have one long chain. Mike's comment below works for the first case, but not the second.


回答1:


If you wrap the plotting code in {...}, you can use . to specify exactly where the previously calculated results are inserted:

library(tidyverse)

df <- tibble(
  area = rep(c("Health", "Education"), 6),
  sub_area = rep(c("Staff", "Projects", "Activities"), 4),
  year = c(rep(2016, 6), rep(2017, 6)),
  value = rep(c(15000, 12000, 18000), 4)
) %>% arrange(area)

df %>% filter(area == "Health") %>% {
    ggplot(.) +    # add . to specify to insert results here
        geom_line(aes(x = as.factor(year), y = value, 
                      group = sub_area, color = sub_area), size = 2) + 
        geom_point(aes(x = as.factor(year), y = value, 
                       group = sub_area, color = sub_area), size = 2) +
        theme_minimal(base_size = 18) + 
        geom_text(data = dplyr::filter(., year == 2016 & sub_area == "Activities"),    # and here
                  aes(x = as.factor(year), y = value, 
                      color = sub_area, label = area), size = 6, hjust = 1)
}

While that plot is probably not what you really want, at least it runs so you can edit it.

What's happening: Normally %>% passes the results of the left-hand side (LHS) to the first parameter of the right-hand side (RHS). However, if you wrap the RHS in braces, %>% will only pass the results in to wherever you explicitly put a .. This formulation is useful for nested sub-pipelines or otherwise complicated calls (like a ggplot chain) that can't otherwise be sorted out just by redirecting with a .. See help('%>%', 'magrittr') for more details and options.




回答2:


Writing:

geom_text(data = df[df$year == 2016 & df$sub_area == "Activities",],...

instead of

geom_text(data = dplyr::filter(., year == 2016 & sub_area == "Activities"),...

makes it work but you still have issues about the position of the text (you should be able to easily find help on SO for that issue).



来源:https://stackoverflow.com/questions/44007998/subset-filter-in-dplyr-chain-with-ggplot2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!