问题
I have a data frame with 10k rows and 3 columns: xpos, ypos and cluster (cluster is a number from 0 to 9) here: http://pastebin.com/NyQw29tb
I would like to show a hex plot with each hexagon colored according to the most-frequent cluster within that hexagon.
So far I've got:
library(ggplot2)
library(hexbin)
ggplot(clusters, aes(x=xpos, y=ypos, z=cluster)) + stat_summary_hex(fun.x=mode)
Which I think is giving me what I want (i.e. is filling in every hexagon with a color from 0 to 9), but the color scale appears continuous, and I can't figure out how to make it use a discrete one.

For extra context, here's the underlying, messier view of the data, which I'm trying to smooth out by using hexagons:
qplot(data=clusters, xpos, ypos, color=factor(cluster))

回答1:
I don't knw what your stat_summary_hex(fun.x=mode)
is doing, but I'm pretty sure it's not what you think (mode
gives the storage mode of an object, not the statistical mode, and fun.x
doesn't match any formal argument of stat_summary_hex
). Try this. It tabulates the observations in each bin, and pulls out the label of the maximum count.
ggplot(clusters, aes(x=xpos, y=ypos, z=cluster)) + stat_summary_hex(fun = function(x) {
tab <- table(x)
names(tab)[which.max(tab)]
})

回答2:
I believe there are two problems here. First, mode
is not the function you want (check the help--it's to "Get or set the type or storage mode of an object"). Second, the parameter if fun=
rather than fun.x=
for stat_summary_hex
.
There's a nice discussion of mode functions here. The recommended function is:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
Finally, you want to make sure that the fill for the hexagons is treated as a discrete value. You can modify the fun
function so that the return value is a character (as in the code below).
Here is a reproducible example:
library(ggplot2)
library(hexbin)
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
clusters=data.frame(xpos=rnorm(1000),ypos=rnorm(1000),cluster=rep(1:9,length.out=100))
ggplot(clusters, aes(x=xpos, y=ypos, z=cluster)) +
stat_summary_hex(fun=function(x){as.character(Mode(x))})
I hope this helps.
来源:https://stackoverflow.com/questions/17371591/using-stat-summary-hex-to-show-most-frequent-value-with-discrete-color-scale