I am able to do this easily in Excel, but my dataset has gotten too large. In excel, I would use solver.
Column A,B,C,D = random numbers
Column E = random
Since most optimization routines work best with no constraints, you can transform (reparametrize) the problem of finding four numbers, x, y, z, j, constrained to be between 0 and 1 and to sum up to 1, into the problem of finding three real numbers q1, q2, q3 (with no constraints). For instance, if we have a function s that maps the real line R to the interval (0,1), the following does the trick:
x = s(q1)
y = (1-x) * s(q2)
z = (1-x-y) * s(q3)
j = 1-x-y-z
It is probably easier to understand in two dimensions: in this case, the set of points (x,y,z) with coordinates between 0 and 1 and summing up to 1 is a triangle and s(q1),s(q2) form a coordinate system for points in that triangle.
# Sample data
A <- rnorm(100)
B <- rnorm(100)
C <- rnorm(100)
D <- rnorm(100)
E <- rnorm(100)
f <- function(p) cor(p[1]*A + p[2]*B + p[3]*C + p[4]*D, E)
# Unconstrained optimization
optim(
c(1,1,1,1)/4, # Starting values
f, # Function to maximize
control=list(fnscale=-1) # Maximize (default is to minimize)
)
# Transform the parameters
sigmoid <- function(x) exp(x) / ( 1 + exp(x) )
convert <- function(p) {
q1 <- sigmoid(p[1])
q2 <- (1-q1) * sigmoid(p[2])
q3 <- (1-q1-q2) * sigmoid(p[3])
q4 <- 1-q1-q2-q3
c(q1,q2,q3,q4)
}
# Optimization
g <- function(p) f(convert(p))
p <- optim(c(0,0,0,0), g, control=list(fnscale=-1))
convert(p$par)