Extract p values and r values for all pairwise variables

孤街浪徒 提交于 2021-01-29 17:22:12

问题


I have multiple variables for multiple countries over multiple years. I would like to generate a dataframe containing both an R^2 value and a P value for each pair of variables. I'm somewhat close, have a minimum working example and an idea of what the end product should look like, but am having some difficulties actually implementing it. If anyone could help, that would be most appreciated.

Please note, I would like to do this more manually than using packages like Hmisc as that has created a number of other issues. I'd had a look around for similar solutions as well, but havent had much luck.

# Code to generate minimum working example (country year pairs).  

library(tidyindexR)
library(tidyverse)
library(dplyr)
library(reshape2)
 
# Function to generate minimum working example data 

simulateCountryData = function(N=200, NEACH = 20, SEED=100){
                            
        variableOne<-rnorm(N,sample(1:100, NEACH),0.5)
        variableOne[variableOne<0]<-0

        variableTwo<-rnorm(N,sample(1:100, NEACH),0.5)
        variableTwo[variableTwo<0]<-0
        
        variableThree<-rnorm(N,sample(1:100, NEACH),0.5)
        variableThree[variableTwo<0]<-0
        
        geocodeNum<-factor(rep(seq(1,N/NEACH),each=NEACH))
        
        year<-rep(seq(2000,2000+NEACH-1,1),N/NEACH)
        
        # Putting it all together
        AllData<-data.frame(geocodeNum,
                            year,
                            variableOne,
                            variableTwo,
                            variableThree)
        
        return(AllData)
}

 
# This runs the function and generates the data 
mySimData = simulateCountryData()

I have a reasonable idea of how to get correlations (both p values and r values) between 2 manually selected variables, but am having some trouble implementing it on the entire dataset and on a country level (rather than all at once).

# Example pvalue 
corrP = cor.test(spreadMySimData$variableOne,spreadMySimData$variableTwo)$p.value
# Examplwe r value
corrEst = cor(spreadMySimData$variableOne,spreadMySimData$variableTwo) 

Finally, the end result should look something like this :

myVariables = colnames(spreadMySimData[3:ncol(spreadMySimData)])
myMatrix = expand.grid(myVariables,myVariables)

# I'm having trouble actually trying to get the r values and p values in the dataframe
myMatrix = as.data.frame(myMatrix)
myMatrix$Pval = runif(9,0.01,1) 
myMatrix$Rval = runif(9,0.2,1) 
myMatrix

Thanks again :)


回答1:


This will compute r and p for all the unique pairs.

# matrix of unique pairs coded as numeric
mx_combos <- combn(1:length(myVariables), 2)
# list of unique pairs coded as numeric
ls_combos <- split(mx_combos, rep(1:ncol(mx_combos), each = nrow(mx_combos)))
# for each pair in the list, create a 1 x 4 dataframe
ls_rows <- lapply(ls_combos, function(p) {
  # lookup names of variables
  v1 <- myVariables[p[1]]
  v2 <- myVariables[p[2]]
  # perform the cor.test()
  htest <- cor.test(mySimData[[v1]], mySimData[[v2]])
  # record pertinent info in a dataframe
  data.frame(Var1 = v1, 
             Var2 = v2, 
             Pval = htest$p.value, 
             Rval = unname(htest$estimate))
  })
# row bind the list of dataframes
dplyr::bind_rows(ls_rows)


来源:https://stackoverflow.com/questions/65590563/extract-p-values-and-r-values-for-all-pairwise-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!