What's a good strategy to get a decent overview of big correlation matrices or pairs?

后端 未结 5 792
后悔当初
后悔当初 2020-12-23 14:10

Often some data.frame contains 20+ variables and you want to get a first overview (of the correlation structure). Even on a 30\" screen you run quickly out of s

相关标签:
5条回答
  • 2020-12-23 14:43

    Often, the column structure of a matrix can be presented in a random order. In that case, I'd look to do some reordering. For visualizing and working with sparse matrices, I often do some sort of reordering, such as Reverse Cuthill-McKee or some other form of bandwidth reordering, and this could be applied to other contexts to make visualizations easier.

    For a correlation matrix, you can squash low magnitude correlations (e.g. within (-eps,+eps)) to create sparsity, then reorder to examine the structure.

    What is nice is if you can find blocks of related objects. This reordering plus the heatmaps (using one color gradient for negative correlation, another for positive correlation) can be very helpful.

    0 讨论(0)
  • 2020-12-23 14:49

    I have used heatmap() (or, rather, the underlying image() function) for that purpose. I do not have the code handy anymore, and as I recall I had to fiddle with the colormap to get something that made sense for the [-1, 1] range.

    Here is a simple example:

    R> set.seed(42)
    R> X <- matrix(rnorm(100*20), nrow=100)
    R> XC <- cor(X)
    R> image(XC)        # color range could do with improvements here
    

    correlation as image() plot

    You play further tricks by blanking one lower or upper triangle and putting text there. The PerformanceAnalytics package has a function chart.Correlation() that does that (from the raw data matrix) but it is much slower and will not scale to large matrices as per your original question. I am sure there are others...

    0 讨论(0)
  • 2020-12-23 14:56

    With pairs you can generate some scatterplot matrices quickly. If too many variables are present you could use on of the tools of Rattle:

    enter image description here Other examples at: http://rattle.togaware.com/rattle-screenshots.html

    In fact rattle itself does not do most of the analysis (as dwin pointed out correctly), but it offers (imho) easy tools to quickly run a pca, correlation tree, correlation matrix like above without having to manipulate your dataset to make sure that eg only numeric variables are present in the dataset, ...

    0 讨论(0)
  • 2020-12-23 14:58

    What about doing a PCA on the correlation matrix? Then the angle between variables show their correlation.

    library(HSAUR)
    heptathlon
    round(cor(heptathlon[,-8]),2)   # correlations [without score]
    
    require(vegan)
    PCA <-  rda(heptathlon[,-8], scale=TRUE)   # run a PCA
    biplot(PCA, display = "species")   # correlation biplot
    #  The angles between descriptors in the biplot reflect their correlations
    

    enter image description here

    0 讨论(0)
  • 2020-12-23 15:00

    Well I just have to post about my own package here:)

    You can use qgraph to visualize a correlation matrix as a network. This will plot variables as nodes and correlations as edges connecting the nodes. Green edges indicate positive correlations and red edges indicate negative correlations. The wider and more saturated the edges the stronger the absolute correlation.

    For example (this is the first example from the help page), the following code will plot the correlation matrix of a 240 variable dataset.

    library("qgraph")
    data(big5)
    data(big5groups)
    qgraph(cor(big5),minimum=0.25,cut=0.4,vsize=2,groups=big5groups,legend=TRUE,borders=FALSE)
    title("Big 5 correlations",line=-2,cex.main=2)
    

    enter image description here

    You can also cluster strongly correlated nodes together (uses Fruchterman-Reingold) which creates quite a clear image of what the structure of your correlation matrix actually looks like:

    enter image description here

    And alot more. For some more examples take a look at my site:

    http://sachaepskamp.com/?page_id=73

    0 讨论(0)
提交回复
热议问题