问题
I have a data.frame
of 10 Variables in R
. Lets call them var1
var2
...var10
I want to find correlation of one of var1
with respect to
var2
, var3
... var10
How can we do that?
cor
function can find correlation between 2 variables at a time. By using that I had to write cor
function for each Analysis
回答1:
My package corrr
, which helps to explore correlations, has a simple solution for this. I'll use the mtcars data set as an example, and say we want to focus on the correlation of mpg
with all other variables.
install.packages("corrr") # though keep eye out for new version coming soon
library(corrr)
mtcars %>% correlate() %>% focus(mpg)
#> rowname mpg
#> <chr> <dbl>
#> 1 cyl -0.8521620
#> 2 disp -0.8475514
#> 3 hp -0.7761684
#> 4 drat 0.6811719
#> 5 wt -0.8676594
#> 6 qsec 0.4186840
#> 7 vs 0.6640389
#> 8 am 0.5998324
#> 9 gear 0.4802848
#> 10 carb -0.5509251
Here, correlate()
produces a correlation data frame, and focus()
lets you focus on the correlations of certain variables with all others.
FYI, focus()
works similarly to select()
from the dplyr
package, except that it alters rows as well as columns. So if you're familiar with select()
, you should find it easy to use focus()
. E.g.:
mtcars %>% correlate() %>% focus(mpg:drat)
#> rowname mpg cyl disp hp drat
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065
#> 2 qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476
#> 3 vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846
#> 4 am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113
#> 5 gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013
#> 6 carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980
回答2:
Another way would be to use libraries Hmisc and corrplot to get correlations amongst all pairs, significance and a pretty plot like so :
#Your data frame (4 variables instead of 10)
df<-data.frame(a=c(1:100),b=rpois(1:100,.2),c=rpois(1:100,.4),d=rpois(1:100,.8),e=2*c(1:100))
#setup
library(Hmisc)
library(corrplot)
df<-scale(df)# normalize the data frame. This will also convert the df to a matrix.
corr<-rcorr(df) # compute Pearson's (or spearman's corr) with rcorr from Hmisc package. I like rcorr as it allows to separately access the correlations, the # or observations and the p-value. ?rcorr is worth a read.
corr_r<-as.matrix(corr[[1]])# Access the correlation matrix.
corr_r[,1]# subset the correlation of "a" (=var1 ) with the rest if you want.
pval<-as.matrix(corr[[3]])# get the p-values
corrplot(corr_r,method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot all pairs
corrplot(corr_r,p.mat = pval,sig.level=0.05,insig = "blank",method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot pairs with significance cutoff defined by "p.mat"
来源:https://stackoverflow.com/questions/38548943/correlation-between-multiple-variables-of-a-data-frame