scatterplotMatrix with group histograms

别等时光非礼了梦想. 提交于 2020-01-12 10:54:51

问题


It's pretty easy to build a nice huge scatterplot matrix with histograms down the diagonal for multivariate data as follows:

scatterplotMatrix(somedata[1:points.count,],groups=somedata[1:points.count,class],
                by.groups=TRUE,diagonal="histogram")

According to the documentation though, it doesn't seem possible to divide up the histogram by the group labels as is done in this question. How would you do that using scatterplotMatrix or a similar function?


回答1:


Is this what you had in mind?

Using the iris dataset:

library(ggplot2)
library(data.table)
library(reshape2)  # for melt(...)
library(plyr)      # for .(...)

xx <- with(iris, data.table(id=1:nrow(iris), group=Species, 
           Sepal.Length, Sepal.Width,Petal.Length, Petal.Width))
# reshape for facetting with ggplot
yy <- melt(xx,id=1:2, variable.name="H", value.name="xval")
yy <- data.table(yy,key="id,group")
ww <- yy[,list(V=H,yval=xval),key="id,group"]
zz <- yy[ww,allow.cartesian=T]
setkey(zz,H,V,group)
zz <- zz[,list(id, group, xval, yval, min.x=min(xval), min.y=min(yval),
               range.x=diff(range(xval)),range.y=diff(range(yval))),by="H,V"]
# points colored by group (=species)
# density plots for each variable by group
d  <-  zz[H==V, list(x=density(xval)$x,
          y=mean(min.y)+mean(range.y)*density(xval)$y/max(density(xval)$y)),
          by="H,V,group"]
ggp = ggplot(zz)
ggp = ggp + geom_point(subset  =.(H!=V), 
                       aes(x=xval, y=yval, color=factor(group)), 
                       size=3, alpha=0.5)
ggp = ggp + geom_line(subset = .(H==V), data=d, aes(x=x, y=y, color=factor(group)))
ggp = ggp + facet_grid(V~H, scales="free")
ggp = ggp + scale_color_discrete(name="Species")
ggp = ggp + labs(x="", y="")
ggp

I keep hearing that the same thing is possible using ggpairs(...) in package GGally. I would love to see an actual example of it. The documentation is inscrutable. Also, ggpairs(...) is extremely slow (in my hands), especially with large datasets.




回答2:


For later reference, the GGally way to do it is as follows:

require(ggpairs)
tmp <- data.table(a = runif(30),b = runif(30), c = runif(30)+1, 
                  d = as.factor(sample(0:1,size=30, replace=TRUE)))

ggpairs(data=tmp, diag=list(continuous="density"), columns=1:3, colour="d",
        axisLabels="show")

This intrepid asker figured out that you have to enable axisLabels which is somewhat silly, given the aesthetic emphasis of ggplot and friends.

Now I want to know how to parallelize this, because it's a monster with high numbers of variables.



来源:https://stackoverflow.com/questions/21080986/scatterplotmatrix-with-group-histograms

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!