Correlation between multiple variables of a data frame

后端未结

关注

 3  2024

长发绾君心

I have a data.frame of 10 Variables in R. Lets call them var1 var2...var10

I want to find correlatio

相关标签:

3条回答

别跟我提以往

2020-12-31 22:11

My package corrr, which helps to explore correlations, has a simple solution for this. I'll use the mtcars data set as an example, and say we want to focus on the correlation of mpg with all other variables.

install.packages("corrr")  # though keep eye out for new version coming soon
library(corrr)
mtcars %>% correlate() %>% focus(mpg)


#>    rowname        mpg
#>      <chr>      <dbl>
#> 1      cyl -0.8521620
#> 2     disp -0.8475514
#> 3       hp -0.7761684
#> 4     drat  0.6811719
#> 5       wt -0.8676594
#> 6     qsec  0.4186840
#> 7       vs  0.6640389
#> 8       am  0.5998324
#> 9     gear  0.4802848
#> 10    carb -0.5509251

Here, correlate() produces a correlation data frame, and focus() lets you focus on the correlations of certain variables with all others.

FYI, focus() works similarly to select() from the dplyr package, except that it alters rows as well as columns. So if you're familiar with select(), you should find it easy to use focus(). E.g.:

mtcars %>% correlate() %>% focus(mpg:drat)

#>   rowname        mpg        cyl       disp         hp        drat
#>     <chr>      <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
#> 1      wt -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065
#> 2    qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476
#> 3      vs  0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846
#> 4      am  0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113
#> 5    gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013
#> 6    carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980

0 讨论(0)

温柔的废话

2020-12-31 22:32
I think better still, you could get the correlation, not just mapped one variable to all but all variables mapped to all others. You can do that easily with just one line of code. Using the pre-installed mtcars datasets.
```
library(dplyr)

cor(select(mtcars, mpg, wt, disp, drat, qsec, hp ))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

梦毁少年i

2020-12-31 22:32

Another way would be to use libraries Hmisc and corrplot to get correlations amongst all pairs, significance and a pretty plot like so :

#Your data frame (4 variables instead of 10)    
df<-data.frame(a=c(1:100),b=rpois(1:100,.2),c=rpois(1:100,.4),d=rpois(1:100,.8),e=2*c(1:100))

#setup 
library(Hmisc) 
library(corrplot)

 df<-scale(df)# normalize the data frame. This will also convert the df to a matrix.  

corr<-rcorr(df) # compute Pearson's (or spearman's corr) with rcorr from Hmisc package. I like rcorr as it allows to separately access the correlations, the # or observations and the p-value. ?rcorr is worth a read.
corr_r<-as.matrix(corr[[1]])# Access the correlation matrix. 
corr_r[,1]# subset the correlation of "a" (=var1 ) with the rest if you want.
pval<-as.matrix(corr[[3]])# get the p-values

corrplot(corr_r,method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot all pairs

corrplot(corr_r,p.mat = pval,sig.level=0.05,insig = "blank",method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot pairs with significance cutoff defined by "p.mat"

0 讨论(0)