In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

后端 未结 3 2193
自闭症患者
自闭症患者 2020-12-10 22:00

I am able to do this easily in Excel, but my dataset has gotten too large. In excel, I would use solver.

Column A,B,C,D = random numbers 
Column E = random          


        
3条回答
  •  鱼传尺愫
    2020-12-10 22:46

    Since most optimization routines work best with no constraints, you can transform (reparametrize) the problem of finding four numbers, x, y, z, j, constrained to be between 0 and 1 and to sum up to 1, into the problem of finding three real numbers q1, q2, q3 (with no constraints). For instance, if we have a function s that maps the real line R to the interval (0,1), the following does the trick:

      x = s(q1)
      y = (1-x) * s(q2)
      z = (1-x-y) * s(q3)
      j = 1-x-y-z
    

    It is probably easier to understand in two dimensions: in this case, the set of points (x,y,z) with coordinates between 0 and 1 and summing up to 1 is a triangle and s(q1),s(q2) form a coordinate system for points in that triangle.

    # Sample data
    A <- rnorm(100)
    B <- rnorm(100)
    C <- rnorm(100)
    D <- rnorm(100)
    E <- rnorm(100)
    f <- function(p) cor(p[1]*A + p[2]*B + p[3]*C + p[4]*D, E)
    
    # Unconstrained optimization
    optim(
      c(1,1,1,1)/4, # Starting values
      f,            # Function to maximize
      control=list(fnscale=-1) # Maximize (default is to minimize)
    )
    
    # Transform the parameters
    sigmoid <- function(x) exp(x) / ( 1 + exp(x) )
    convert <- function(p) {
      q1 <- sigmoid(p[1])
      q2 <- (1-q1) * sigmoid(p[2])
      q3 <- (1-q1-q2) * sigmoid(p[3])
      q4 <- 1-q1-q2-q3 
      c(q1,q2,q3,q4)
    }
    
    # Optimization
    g <- function(p) f(convert(p))
    p <- optim(c(0,0,0,0), g, control=list(fnscale=-1))
    convert(p$par)
    

提交回复
热议问题