ggplot2: add conditional density curves describing both dimensions of scatterplot

匿名 (未验证) 提交于 2019-12-03 03:06:01

问题:

I have scatterplots of 2D data from two categories. I want to add density lines for each dimension -- not outside the plot (cf. Scatterplot with marginal histograms in ggplot2) but right on the plotting surface. I can get this for the x-axis dimension, like this:

set.seed(123) dim1 <- c(rnorm(100, mean=1), rnorm(100, mean=4)) dim2 <- rnorm(200, mean=1) cat <- factor(c(rep("a", 100), rep("b", 100))) mydf <- data.frame(cbind(dim2, dim1, cat)) ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +    geom_point() +   stat_density(aes(x=dim1, y=(-2+(..scaled..))),    position="identity", geom="line") 

It looks like this:

But I want an analogous pair of density curves running vertically, showing the distribution of points in the y-dimension. I tried

stat_density(aes(y=dim2, x=0+(..scaled..))), position="identity", geom="line) 

but receive the error "stat_density requires the following missing aesthetics: x".

Any ideas? thanks

回答1:

You can get the densities of the dim2 variables. Then, flip the axes and store them in a new data.frame. After that it is simply plotting them on top of the other graph.

p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +    geom_point() +   stat_density(aes(x=dim1, y=(-2+(..scaled..))),                 position="identity", geom="line")  stuff <- ggplot_build(p) xrange <- stuff[[2]]$ranges[[1]]$x.range  # extract the x range, to make the new densities align with y-axis  ## Get densities of dim2 ds <- do.call(rbind, lapply(unique(mydf$cat), function(lev) {     dens <- with(mydf, density(dim2[cat==lev]))     data.frame(x=dens$y+xrange[1], y=dens$x, cat=lev) }))  p + geom_path(data=ds, aes(x=x, y=y, color=factor(cat))) 



回答2:

So far I can produce:

distrib_horiz <- stat_density(aes(x=dim1, y=(-2+(..scaled..))),                                position="identity", geom="line")  ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +    geom_point() + distrib_horiz 

And:

distrib_vert <- stat_density(data=mydf, aes(x=dim2, y=(-2+(..scaled..))),                               position="identity", geom="line")   ggplot(data=mydf, aes(x=dim2, y=dim1, colour=as.factor(cat))) +    geom_point() + distrib_vert + coord_flip() 

But combining them is proving tricky.



回答3:

So far I have only a partial solution since I didn't manage to obtain a vertical stat_density line for each individual category, only for the total set. Maybe this can nevertheless help as a starting point for finding a better solution. My suggestion is to try with the ggMarginal() function from the ggExtra package.

p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +    geom_point() + stat_density(aes(x=dim1, y=(-2+(..scaled..))),             position="identity", geom="line") library(ggExtra) ggMarginal(p,type = "density", margins = "y", size = 4) 

This is what I obtain:

I know it's not perfect, but maybe it's a step in a helpful direction. At least I hope so. Looking forward to seeing other answers.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!