In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

后端 未结 3 2188
自闭症患者
自闭症患者 2020-12-10 22:00

I am able to do this easily in Excel, but my dataset has gotten too large. In excel, I would use solver.

Column A,B,C,D = random numbers 
Column E = random          


        
3条回答
  •  天涯浪人
    2020-12-10 22:37

    It seems like what you might want is to do a linear regression. This finds coefficients to multiply your predictors by (your predictors in this case being A, B, C and D) so that the fitted values produced have the smallest possible squared difference from the actual values. This is not quite the same as maximising the correlation between the fitted and actual values, but it does the same job. Here's an example- the coefficients of a, b, c and d anre equivalent to your x y z j

    > a <- rnorm(10)
    > b <- rnorm(10)
    > c <- rnorm(10)
    > d <- rnorm(10)
    > e <- rnorm(10)
    > lm(e~ a + b + c +d)
    
    Call:
    lm(formula = e ~ a + b + c + d)
    
    Coefficients:
    (Intercept)            a            b            c            d  
        -0.2881      -0.1898      -0.7282       0.2121       0.2758  
    

    However, this linear model has an extra parameter, the intercept. The intercept is a constant added to all the fitted values, so that the fitted values are actually: fitted = constant + a*x + b*y + c*z +d*j

    You can run a linear regression without fitting an intercept by running:

    lm(e~ -1 + a + b + c +d)

提交回复
热议问题