scatterplotMatrix with group histograms

好久不见. 提交于 2019-12-03 21:45:13

Is this what you had in mind?

Using the iris dataset:

library(ggplot2)
library(data.table)
library(reshape2)  # for melt(...)
library(plyr)      # for .(...)

xx <- with(iris, data.table(id=1:nrow(iris), group=Species, 
           Sepal.Length, Sepal.Width,Petal.Length, Petal.Width))
# reshape for facetting with ggplot
yy <- melt(xx,id=1:2, variable.name="H", value.name="xval")
yy <- data.table(yy,key="id,group")
ww <- yy[,list(V=H,yval=xval),key="id,group"]
zz <- yy[ww,allow.cartesian=T]
setkey(zz,H,V,group)
zz <- zz[,list(id, group, xval, yval, min.x=min(xval), min.y=min(yval),
               range.x=diff(range(xval)),range.y=diff(range(yval))),by="H,V"]
# points colored by group (=species)
# density plots for each variable by group
d  <-  zz[H==V, list(x=density(xval)$x,
          y=mean(min.y)+mean(range.y)*density(xval)$y/max(density(xval)$y)),
          by="H,V,group"]
ggp = ggplot(zz)
ggp = ggp + geom_point(subset  =.(H!=V), 
                       aes(x=xval, y=yval, color=factor(group)), 
                       size=3, alpha=0.5)
ggp = ggp + geom_line(subset = .(H==V), data=d, aes(x=x, y=y, color=factor(group)))
ggp = ggp + facet_grid(V~H, scales="free")
ggp = ggp + scale_color_discrete(name="Species")
ggp = ggp + labs(x="", y="")
ggp

I keep hearing that the same thing is possible using ggpairs(...) in package GGally. I would love to see an actual example of it. The documentation is inscrutable. Also, ggpairs(...) is extremely slow (in my hands), especially with large datasets.

bright-star

For later reference, the GGally way to do it is as follows:

require(ggpairs)
tmp <- data.table(a = runif(30),b = runif(30), c = runif(30)+1, 
                  d = as.factor(sample(0:1,size=30, replace=TRUE)))

ggpairs(data=tmp, diag=list(continuous="density"), columns=1:3, colour="d",
        axisLabels="show")

This intrepid asker figured out that you have to enable axisLabels which is somewhat silly, given the aesthetic emphasis of ggplot and friends.

Now I want to know how to parallelize this, because it's a monster with high numbers of variables.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!