Correlation between multiple variables of a data frame

杀马特。学长 韩版系。学妹 提交于 2019-12-09 13:29:58

问题


I have a data.frame of 10 Variables in R. Lets call them var1 var2...var10

I want to find correlation of one of var1 with respect to var2, var3 ... var10

How can we do that?

cor function can find correlation between 2 variables at a time. By using that I had to write cor function for each Analysis


回答1:


My package corrr, which helps to explore correlations, has a simple solution for this. I'll use the mtcars data set as an example, and say we want to focus on the correlation of mpg with all other variables.

install.packages("corrr")  # though keep eye out for new version coming soon
library(corrr)
mtcars %>% correlate() %>% focus(mpg)


#>    rowname        mpg
#>      <chr>      <dbl>
#> 1      cyl -0.8521620
#> 2     disp -0.8475514
#> 3       hp -0.7761684
#> 4     drat  0.6811719
#> 5       wt -0.8676594
#> 6     qsec  0.4186840
#> 7       vs  0.6640389
#> 8       am  0.5998324
#> 9     gear  0.4802848
#> 10    carb -0.5509251

Here, correlate() produces a correlation data frame, and focus() lets you focus on the correlations of certain variables with all others.

FYI, focus() works similarly to select() from the dplyr package, except that it alters rows as well as columns. So if you're familiar with select(), you should find it easy to use focus(). E.g.:

mtcars %>% correlate() %>% focus(mpg:drat)

#>   rowname        mpg        cyl       disp         hp        drat
#>     <chr>      <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
#> 1      wt -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065
#> 2    qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476
#> 3      vs  0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846
#> 4      am  0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113
#> 5    gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013
#> 6    carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980



回答2:


Another way would be to use libraries Hmisc and corrplot to get correlations amongst all pairs, significance and a pretty plot like so :

#Your data frame (4 variables instead of 10)    
df<-data.frame(a=c(1:100),b=rpois(1:100,.2),c=rpois(1:100,.4),d=rpois(1:100,.8),e=2*c(1:100))

#setup 
library(Hmisc) 
library(corrplot)

 df<-scale(df)# normalize the data frame. This will also convert the df to a matrix.  

corr<-rcorr(df) # compute Pearson's (or spearman's corr) with rcorr from Hmisc package. I like rcorr as it allows to separately access the correlations, the # or observations and the p-value. ?rcorr is worth a read.
corr_r<-as.matrix(corr[[1]])# Access the correlation matrix. 
corr_r[,1]# subset the correlation of "a" (=var1 ) with the rest if you want.
pval<-as.matrix(corr[[3]])# get the p-values

corrplot(corr_r,method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot all pairs

corrplot(corr_r,p.mat = pval,sig.level=0.05,insig = "blank",method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot pairs with significance cutoff defined by "p.mat"


来源:https://stackoverflow.com/questions/38548943/correlation-between-multiple-variables-of-a-data-frame

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!