Correlation between multiple variables of a data frame

后端 未结 3 2024
长发绾君心
长发绾君心 2020-12-31 21:45

I have a data.frame of 10 Variables in R. Lets call them var1 var2...var10

I want to find correlatio

相关标签:
3条回答
  • 2020-12-31 22:11

    My package corrr, which helps to explore correlations, has a simple solution for this. I'll use the mtcars data set as an example, and say we want to focus on the correlation of mpg with all other variables.

    install.packages("corrr")  # though keep eye out for new version coming soon
    library(corrr)
    mtcars %>% correlate() %>% focus(mpg)
    
    
    #>    rowname        mpg
    #>      <chr>      <dbl>
    #> 1      cyl -0.8521620
    #> 2     disp -0.8475514
    #> 3       hp -0.7761684
    #> 4     drat  0.6811719
    #> 5       wt -0.8676594
    #> 6     qsec  0.4186840
    #> 7       vs  0.6640389
    #> 8       am  0.5998324
    #> 9     gear  0.4802848
    #> 10    carb -0.5509251
    

    Here, correlate() produces a correlation data frame, and focus() lets you focus on the correlations of certain variables with all others.

    FYI, focus() works similarly to select() from the dplyr package, except that it alters rows as well as columns. So if you're familiar with select(), you should find it easy to use focus(). E.g.:

    mtcars %>% correlate() %>% focus(mpg:drat)
    
    #>   rowname        mpg        cyl       disp         hp        drat
    #>     <chr>      <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
    #> 1      wt -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065
    #> 2    qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476
    #> 3      vs  0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846
    #> 4      am  0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113
    #> 5    gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013
    #> 6    carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980
    
    0 讨论(0)
  • 2020-12-31 22:32

    I think better still, you could get the correlation, not just mapped one variable to all but all variables mapped to all others. You can do that easily with just one line of code. Using the pre-installed mtcars datasets.

    library(dplyr)
    
    cor(select(mtcars, mpg, wt, disp, drat, qsec, hp ))
    
    0 讨论(0)
  • 2020-12-31 22:32

    Another way would be to use libraries Hmisc and corrplot to get correlations amongst all pairs, significance and a pretty plot like so :

    #Your data frame (4 variables instead of 10)    
    df<-data.frame(a=c(1:100),b=rpois(1:100,.2),c=rpois(1:100,.4),d=rpois(1:100,.8),e=2*c(1:100))
    
    #setup 
    library(Hmisc) 
    library(corrplot)
    
     df<-scale(df)# normalize the data frame. This will also convert the df to a matrix.  
    
    corr<-rcorr(df) # compute Pearson's (or spearman's corr) with rcorr from Hmisc package. I like rcorr as it allows to separately access the correlations, the # or observations and the p-value. ?rcorr is worth a read.
    corr_r<-as.matrix(corr[[1]])# Access the correlation matrix. 
    corr_r[,1]# subset the correlation of "a" (=var1 ) with the rest if you want.
    pval<-as.matrix(corr[[3]])# get the p-values
    
    corrplot(corr_r,method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot all pairs
    
    corrplot(corr_r,p.mat = pval,sig.level=0.05,insig = "blank",method="circle",type="lower",diag=FALSE,tl.col="black",tl.cex=1,tl.offset=0.1,tl.srt=45)# plot pairs with significance cutoff defined by "p.mat"
    
    0 讨论(0)
提交回复
热议问题