Bar charts connected by lines / How to connect two graphs arranged with grid.arrange in R / ggplot2

我是研究僧i 提交于 2019-12-03 16:51:07

This is a really interesting problem. I approximated it using the patchwork library, which lets you add ggplots together and gives you an easy way to control their layout—I much prefer it to doing anything grid.arrange-based, and for some things it works better than cowplot.

I expanded on the dataset just to get some more values in the two data frames.

library(tidyverse)
library(patchwork)

set.seed(1017)

state1 <- data_frame(
  state = rep(state.name[1:5], each = 3),
  value = floor(runif(15, 1, 100)),
  type = rep(c("state", "local", "fed"), times = 5)
)

state2 <- data_frame(
  state = rep(state.name[1:5], each = 3),
  value = floor(runif(15, 1, 100)),
  type = rep(c("state", "local", "fed"), times = 5)
)

Then I made a data frame that assigns ranks to each state based on other values in their original data frame (state1 or state2).

ranks <- bind_rows(
  state1 %>% mutate(position = 1),
  state2 %>% mutate(position = 2)
)  %>%
  group_by(position, state) %>%
  summarise(state_total = sum(value)) %>%
  mutate(rank = dense_rank(state_total)) %>%
  ungroup()

I made a quick theme to keep things very minimal and drop axis marks:

theme_min <- function(...) theme_minimal(...) +
  theme(panel.grid = element_blank(), legend.position = "none", axis.title = element_blank())

The bump chart (the middle one) is based on the ranks data frame, and has no labels. Using factors instead of numeric variables for position and rank gave me a little more control over spacing, and lets the ranks line up with discrete 1 through 5 values in a way that will match the state names in the bar charts.

p_ranks <- ggplot(ranks, aes(x = as.factor(position), y = as.factor(rank), group = state)) +
  geom_path() +
  scale_x_discrete(breaks = NULL, expand = expand_scale(add = 0.1)) +
  scale_y_discrete(breaks = NULL) +
  theme_min()
p_ranks

For the left bar chart, I sort the states by value and turn the values negative to point to the left, then give it the same minimal theme:

p_left <- state1 %>%
  mutate(state = as.factor(state) %>% fct_reorder(value, sum)) %>%
  arrange(state) %>%
  mutate(value = value * -1) %>%
  ggplot(aes(x = state, y = value, fill = type)) +
    geom_col(position = "stack") +
    coord_flip() +
    scale_y_continuous(breaks = NULL) +
    theme_min() +
    scale_fill_brewer()
p_left

The right bar chart is pretty much the same, except the values stay positive and I moved the x-axis to the top (becomes right when I flip the coordinates):

p_right <- state2 %>%
  mutate(state = as.factor(state) %>% fct_reorder(value, sum)) %>%
  arrange(state) %>%
  ggplot(aes(x = state, y = value, fill = type)) +
    geom_col(position = "stack") +
    coord_flip() +
    scale_x_discrete(position = "top") +
    scale_y_continuous(breaks = NULL) +
    theme_min() +
    scale_fill_brewer()

Then because I've loaded patchwork, I can add the plots together and specify the layout.

p_left + p_ranks + p_right +
  plot_layout(nrow = 1)

You may want to adjust spacing and margins some more, such as with the expand_scale call with the bump chart. I haven't tried this with axis marks along the y-axes (i.e. bottoms after flipping), but I have a feeling things might get thrown out of whack if you don't add a dummy axis to the ranks. Plenty still to mess around with, but it's a cool visualization project you posed!

Here's a pure ggplot2 solution, which combines the underlying data frames into one & plots everything in a single plot:

Data manipulation:

library(dplyr)    
bar.width <- 0.9

# combine the two data sources
df <- rbind(state1 %>% mutate(source = "state1"),
            state2 %>% mutate(source = "state2")) %>%

  # calculate each state's rank within each data source
  group_by(source, state) %>%
  mutate(state.sum = sum(value)) %>%
  ungroup() %>%
  group_by(source) %>%
  mutate(source.rank = as.integer(factor(state.sum))) %>%
  ungroup() %>%

  # calculate the dimensions for each bar
  group_by(source, state) %>%
  arrange(type) %>% 
  mutate(xmin = lag(cumsum(value), default = 0),
         xmax = cumsum(value),
         ymin = source.rank - bar.width / 2,
         ymax = source.rank + bar.width / 2) %>% 
  ungroup() %>%

  # shift each data source's coordinates away from point of origin,
  # in order to create space for plotting lines
  mutate(x = ifelse(source == "state1", -max(xmax) / 2, max(xmax) / 2)) %>%
  mutate(xmin = ifelse(source == "state1", x - xmin, x + xmin),
         xmax = ifelse(source == "state1", x - xmax, x + xmax)) %>%

  # calculate label position for each data source
  group_by(source) %>%
  mutate(label.x = max(abs(xmax))) %>%
  ungroup() %>%
  mutate(label.x = ifelse(source == "state1", -label.x, label.x),
         hjust = ifelse(source == "state1", 1.1, -0.1))

Plot:

ggplot(df, 
       aes(x = x, y = source.rank,
           xmin = xmin, xmax = xmax, 
           ymin = ymin, ymax = ymax,
           fill = type)) +
  geom_rect() +
  geom_line(aes(group = state)) +
  geom_text(aes(x = label.x, label = state, hjust = hjust),
            check_overlap = TRUE) +

  # allow some space for the labels; this may be changed
  # depending on plot dimensions
  scale_x_continuous(expand = c(0.2, 0)) +
  scale_fill_manual(values = fill) +

  theme_void() +
  theme(legend.position = "top")

Data source (same as @camille's):

set.seed(1017)

state1 <- data_frame(
  state = rep(state.name[1:5], each = 3),
  value = floor(runif(15, 1, 100)),
  type = rep(c("state", "local", "fed"), times = 5)
)

state2 <- data_frame(
  state = rep(state.name[1:5], each = 3),
  value = floor(runif(15, 1, 100)),
  type = rep(c("state", "local", "fed"), times = 5)
)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!