Calculating optimized weights to maximize correlation

问题

I have two time series data, columns A and B.

I am computing rolling moving averages of different duration on column A. For example (5,10,15,20).

I want to assign weights to each of these average columns so that the sumproduct of weights and average columns has maximum correlation with column B. In other words, how to implement excel like optimization in Python.

Please have a look at the sample code and suggest the way forward.

import pandas as pd
import numpy as np

dates = pd.date_range('20130101', periods=100)

df = pd.DataFrame(np.random.randn(100, 2), index=dates, columns=list('AB'))

df['sma_5']=df['A'].rolling(5).mean()

df['sma_10']=df['A'].rolling(10).mean()

df['sma_15']=df['A'].rolling(15).mean()

df['sma_20']=df['A'].rolling(20).mean()

w=[0.25,0.25,0.25,0.25]

df['B_friend'']= 
w[0]*df['sma_5']+w[1]*df['sma_10']+w[2]*df['sma_15']+w[3]*df['sma_20']

Need to optimize the weights 'w' to maximize the correlation.

df['B'].corr(df['B_friend'])

Thanks in advance.

回答1:

scipy.optimize.minimize function looks like what you need: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#scipy.optimize.minimize

The code would look something like this:

import pandas as pd
import numpy as np
import scipy.optimize as opt

dates = pd.date_range('20130101', periods=100)
df = pd.DataFrame(np.random.randn(100, 2), index=dates, columns=list('AB'))
df['sma_5']=df['A'].rolling(5).mean()
df['sma_10']=df['A'].rolling(10).mean()
df['sma_15']=df['A'].rolling(15).mean()
df['sma_20']=df['A'].rolling(20).mean()

def fun(x):
    w = x
    B_friend=w[0]*df['sma_5']+w[1]*df['sma_10']+w[2]*df['sma_15']+w[3]*df['sma_20']
    # -np.abs(corr) instead of just corrr is used
    # in order to turn a maximization problem into a
    # minimization problem
    return -np.abs(df['B'].corr(B_friend))

w=[0.25,0.25,0.25,0.25]
opt.minimize(fun, w)

来源：https://stackoverflow.com/questions/54686052/calculating-optimized-weights-to-maximize-correlation

标签

python

pandas

optimization