Correlation between multiple variables of a data frame

后端 未结 3 2027
长发绾君心
长发绾君心 2020-12-31 21:45

I have a data.frame of 10 Variables in R. Lets call them var1 var2...var10

I want to find correlatio

3条回答
  •  梦毁少年i
    2020-12-31 22:32

    Another way would be to use libraries Hmisc and corrplot to get correlations amongst all pairs, significance and a pretty plot like so :

    #Your data frame (4 variables instead of 10)    
    df<-data.frame(a=c(1:100),b=rpois(1:100,.2),c=rpois(1:100,.4),d=rpois(1:100,.8),e=2*c(1:100))
    
    #setup 
    library(Hmisc) 
    library(corrplot)
    
     df<-scale(df)# normalize the data frame. This will also convert the df to a matrix.  
    
    corr<-rcorr(df) # compute Pearson's (or spearman's corr) with rcorr from Hmisc package. I like rcorr as it allows to separately access the correlations, the # or observations and the p-value. ?rcorr is worth a read.
    corr_r<-as.matrix(corr[[1]])# Access the correlation matrix. 
    corr_r[,1]# subset the correlation of "a" (=var1 ) with the rest if you want.
    pval<-as.matrix(corr[[3]])# get the p-values
    
    corrplot(corr_r,method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot all pairs
    
    corrplot(corr_r,p.mat = pval,sig.level=0.05,insig = "blank",method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot pairs with significance cutoff defined by "p.mat"
    

提交回复
热议问题