问题
It's pretty easy to build a nice huge scatterplot matrix with histograms down the diagonal for multivariate data as follows:
scatterplotMatrix(somedata[1:points.count,],groups=somedata[1:points.count,class],
by.groups=TRUE,diagonal="histogram")
According to the documentation though, it doesn't seem possible to divide up the histogram by the group labels as is done in this question. How would you do that using scatterplotMatrix or a similar function?
回答1:
Is this what you had in mind?
Using the iris dataset:
library(ggplot2)
library(data.table)
library(reshape2) # for melt(...)
library(plyr) # for .(...)
xx <- with(iris, data.table(id=1:nrow(iris), group=Species,
Sepal.Length, Sepal.Width,Petal.Length, Petal.Width))
# reshape for facetting with ggplot
yy <- melt(xx,id=1:2, variable.name="H", value.name="xval")
yy <- data.table(yy,key="id,group")
ww <- yy[,list(V=H,yval=xval),key="id,group"]
zz <- yy[ww,allow.cartesian=T]
setkey(zz,H,V,group)
zz <- zz[,list(id, group, xval, yval, min.x=min(xval), min.y=min(yval),
range.x=diff(range(xval)),range.y=diff(range(yval))),by="H,V"]
# points colored by group (=species)
# density plots for each variable by group
d <- zz[H==V, list(x=density(xval)$x,
y=mean(min.y)+mean(range.y)*density(xval)$y/max(density(xval)$y)),
by="H,V,group"]
ggp = ggplot(zz)
ggp = ggp + geom_point(subset =.(H!=V),
aes(x=xval, y=yval, color=factor(group)),
size=3, alpha=0.5)
ggp = ggp + geom_line(subset = .(H==V), data=d, aes(x=x, y=y, color=factor(group)))
ggp = ggp + facet_grid(V~H, scales="free")
ggp = ggp + scale_color_discrete(name="Species")
ggp = ggp + labs(x="", y="")
ggp

I keep hearing that the same thing is possible using ggpairs(...)
in package GGally. I would love to see an actual example of it. The documentation is inscrutable. Also, ggpairs(...)
is extremely slow (in my hands), especially with large datasets.
回答2:
For later reference, the GGally way to do it is as follows:
require(ggpairs)
tmp <- data.table(a = runif(30),b = runif(30), c = runif(30)+1,
d = as.factor(sample(0:1,size=30, replace=TRUE)))
ggpairs(data=tmp, diag=list(continuous="density"), columns=1:3, colour="d",
axisLabels="show")

This intrepid asker figured out that you have to enable axisLabels which is somewhat silly, given the aesthetic emphasis of ggplot and friends.
Now I want to know how to parallelize this, because it's a monster with high numbers of variables.
来源:https://stackoverflow.com/questions/21080986/scatterplotmatrix-with-group-histograms