(R) Visualizing a data set with large number of variables using PCA (ggbiplot)

坚强是说给别人听的谎言 提交于 2021-01-29 04:58:45

问题


My dataset has 100 samples and 17000 variables. I would use PCA and visualize data. But the problem is that the plot is not good. How I can control the number of arrows in ggbiplot or biplot, in fact select the most contributed variables? Some sample codes are as below:

data <- matrix(rnorm(1700000), nrow=100, ncol=17000)
colnames(data) <- paste("X", 1:ncol(data), sep="")
pca <- prcomp(data, scale=T, center=T)

biplot(pca)
print(ggbiplot(pca, obs.scale = 1, var.scale = 1, 
               groups = c(rep('a',30), rep('b',70))))


回答1:


I assumed you got a recent version of ggbiplot from github (19 Jun 2015 https://github.com/vqv/ggbiplot). In this one, I don't think there's a clean way to reduce the number of arrows. You'd have to modify the original function by subsetting the df.v in two plotting calls:

around line 89:

g <- g + geom_segment(data = df.v[1:5,], # SUBSET HERE
aes(x = 0, y = 0, xend = xvar, yend = yvar), arrow = arrow(length = unit(1/2, "picas")), color = muted("red"))

and around line 127:

g <- g + geom_text(data = df.v[1:5,], # SUBSET HERE
aes(label = varname, x = xvar, y = yvar, angle = angle, hjust = hjust), color ="darkred", size = varname.size)



来源:https://stackoverflow.com/questions/35917067/r-visualizing-a-data-set-with-large-number-of-variables-using-pca-ggbiplot

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!