In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

后端未结

关注

 3  2193

自闭症患者 2020-12-10 22:00

I am able to do this easily in Excel, but my dataset has gotten too large. In excel, I would use solver.

Column A,B,C,D = random numbers 
Column E = random


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   鱼传尺愫
                                             
                
                
                (楼主)
            
              
              
                2020-12-10 22:46
              

            
            
                        
Since most optimization routines work best with no constraints, 
you can transform (reparametrize) the problem of finding 
four numbers, x, y, z, j,
constrained to be between 0 and 1 and to sum up to 1, 
into the problem of finding three real numbers q1, q2, q3
(with no constraints).
For instance, if we have a function s that maps the real line R
to the interval (0,1), 
the following does the trick:

  x = s(q1)
  y = (1-x) * s(q2)
  z = (1-x-y) * s(q3)
  j = 1-x-y-z


It is probably easier to understand in two dimensions:
in this case, the set of points (x,y,z)
with coordinates between 0 and 1 and summing up to 1
is a triangle and s(q1),s(q2) form a coordinate system
for points in that triangle.

# Sample data
A <- rnorm(100)
B <- rnorm(100)
C <- rnorm(100)
D <- rnorm(100)
E <- rnorm(100)
f <- function(p) cor(p[1]*A + p[2]*B + p[3]*C + p[4]*D, E)

# Unconstrained optimization
optim(
  c(1,1,1,1)/4, # Starting values
  f,            # Function to maximize
  control=list(fnscale=-1) # Maximize (default is to minimize)
)

# Transform the parameters
sigmoid <- function(x) exp(x) / ( 1 + exp(x) )
convert <- function(p) {
  q1 <- sigmoid(p[1])
  q2 <- (1-q1) * sigmoid(p[2])
  q3 <- (1-q1-q2) * sigmoid(p[3])
  q4 <- 1-q1-q2-q3 
  c(q1,q2,q3,q4)
}

# Optimization
g <- function(p) f(convert(p))
p <- optim(c(0,0,0,0), g, control=list(fnscale=-1))
convert(p$par)

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复