问题
I'm dealing with a data frame of categorical variables in case form, made up of three variables (i.e. color, shape and size) and its corresponding frequency. An example of the data frame is like this:
Color Shape Size Freq
1 Yellow Square Big 10
2 Yellow Square Medium 6
3 Yellow Square Small 3
4 Yellow Triangle Big 4
5 Yellow Triangle Medium 6
6 Yellow Triangle Small 8
7 Red Square Big 2
8 Red Square Medium 6
9 Red Square Small 5
10Red Triangle Big 12
.......
The "color" variable is measured against the "shape" and "size" variables, having a frequency for each case.
From this data frame I'm struggling to create a heatmap-like plot where only the relation between "Color" and "Shape" is displayed, and using as weight the variable "Size" with the highest frequency. Bit tricky, isn't it!
For example, for the "Yellow" - "Square" cases I should only display "Big", since "big" is the size with the highest freq. For every size there should be an accompanying color (i.e "red" for big, "green" for medium, and "orange" for small). Frank
回答1:
How about this?
library(dplyr)
library(ggplot2)
df_max <- df %>%
group_by(Color, Shape) %>%
slice(which.max(Freq))
head(df_max)
# Source: local data frame [4 x 4]
# Groups: Color, Shape [4]
#
# Color Shape Size Freq
# (chr) (chr) (chr) (int)
# 1 Red Square Medium 6
# 2 Red Triangle Big 12
# 3 Yellow Square Big 10
# 4 Yellow Triangle Small 8
ggplot(df_max, aes(x = Color, y = Shape, fill = Size)) +
geom_tile()
来源:https://stackoverflow.com/questions/32851208/heatmap-like-plot-for-three-categorical-variables