I have a data.frame
of 10 Variables in R
. Lets call them var1
var2
...var10
I want to find correlatio
My package corrr
, which helps to explore correlations, has a simple solution for this. I'll use the mtcars data set as an example, and say we want to focus on the correlation of mpg
with all other variables.
install.packages("corrr") # though keep eye out for new version coming soon
library(corrr)
mtcars %>% correlate() %>% focus(mpg)
#> rowname mpg
#> <chr> <dbl>
#> 1 cyl -0.8521620
#> 2 disp -0.8475514
#> 3 hp -0.7761684
#> 4 drat 0.6811719
#> 5 wt -0.8676594
#> 6 qsec 0.4186840
#> 7 vs 0.6640389
#> 8 am 0.5998324
#> 9 gear 0.4802848
#> 10 carb -0.5509251
Here, correlate()
produces a correlation data frame, and focus()
lets you focus on the correlations of certain variables with all others.
FYI, focus()
works similarly to select()
from the dplyr
package, except that it alters rows as well as columns. So if you're familiar with select()
, you should find it easy to use focus()
. E.g.:
mtcars %>% correlate() %>% focus(mpg:drat)
#> rowname mpg cyl disp hp drat
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065
#> 2 qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476
#> 3 vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846
#> 4 am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113
#> 5 gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013
#> 6 carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980
I think better still, you could get the correlation, not just mapped one variable to all but all variables mapped to all others. You can do that easily with just one line of code. Using the pre-installed mtcars
datasets.
library(dplyr)
cor(select(mtcars, mpg, wt, disp, drat, qsec, hp ))
Another way would be to use libraries Hmisc and corrplot to get correlations amongst all pairs, significance and a pretty plot like so :
#Your data frame (4 variables instead of 10)
df<-data.frame(a=c(1:100),b=rpois(1:100,.2),c=rpois(1:100,.4),d=rpois(1:100,.8),e=2*c(1:100))
#setup
library(Hmisc)
library(corrplot)
df<-scale(df)# normalize the data frame. This will also convert the df to a matrix.
corr<-rcorr(df) # compute Pearson's (or spearman's corr) with rcorr from Hmisc package. I like rcorr as it allows to separately access the correlations, the # or observations and the p-value. ?rcorr is worth a read.
corr_r<-as.matrix(corr[[1]])# Access the correlation matrix.
corr_r[,1]# subset the correlation of "a" (=var1 ) with the rest if you want.
pval<-as.matrix(corr[[3]])# get the p-values
corrplot(corr_r,method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot all pairs
corrplot(corr_r,p.mat = pval,sig.level=0.05,insig = "blank",method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot pairs with significance cutoff defined by "p.mat"