Python scipy.optimize.fmin_l_bfgs_b error occurs

匿名 (未验证) 提交于 2019-12-03 02:33:02

问题:

My code is to implement an active learning algorithm, using L-BFGS optimization. I want to optimize four parameters: alpha, beta, w and gamma.

However, when I run the code below, I got an error:

optimLogitLBFGS = sp.optimize.fmin_l_bfgs_b(func, x0 = x0, args = (X,Y,Z), fprime = func_grad)                                              File "C:\Python27\lib\site-packages\scipy\optimize\lbfgsb.py", line 188, in fmin_l_bfgs_b     **opts)   File "C:\Python27\lib\site-packages\scipy\optimize\lbfgsb.py", line 311, in _minimize_lbfgsb     isave, dsave)     _lbfgsb.error: failed in converting 7th argument ``g' of _lbfgsb.setulb to C/Fortran array      0-th dimension must be fixed to 22 but got 4 

My code is:

# -*- coding: utf-8 -*- import numpy as np import scipy as sp import scipy.stats as sps  num_labeler = 3 num_instance = 5  X = np.array([[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4],[5,5,5,5]]) Z = np.array([1,0,1,0,1]) Y = np.array([[1,0,1],[0,1,0],[0,0,0],[1,1,1],[1,0,0]])  W = np.array([[1,1,1,1],[2,2,2,2],[3,3,3,3]]) gamma = np.array([1,1,1,1,1]) alpha = np.array([1,1,1,1]) beta = 1 para = np.array([1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,1,1,1,1,1])  def get_params(para):     # extract parameters from 1D parameter vector     assert len(para) == 22     alpha = para[0:4]     beta = para[4]     W = para[5:17].reshape(3, 4)     gamma = para[17:]     return alpha, beta, gamma, W  def log_p_y_xz(yit,zi,sigmati): #log P(y_it|x_i,z_i)     return np.log(sps.norm(zi,sigmati).pdf(yit))#tested  def log_p_z_x(alpha,beta,xi): #log P(z_i=1|x_i)     return -np.log(1+np.exp(-np.dot(alpha,xi)-beta))#tested  def sigma_eta_ti(xi, w_t, gamma_t): # 1+exp(-w_t x_i -gamma_t)^-1     return 1/(1+np.exp(-np.dot(xi,w_t)-gamma_t)) #tested  def df_alpha(X,Y,Z,W,alpha,beta,gamma):#df/dalpha     return np.sum((2/(1+np.exp(-np.dot(alpha,X[i])-beta))-1)*np.exp(-np.dot(alpha,X[i])-beta)*X[i]/(1+np.exp(-np.dot(alpha,X[i])-beta))**2 for i in range (num_instance))     #tested def df_beta(X,Y,Z,W,alpha,beta,gamma):#df/dbelta     return np.sum((2/(1+np.exp(-np.dot(alpha,X[i])-beta))-1)*np.exp(-np.dot(alpha,X[i])-beta)/(1+np.exp(-np.dot(alpha,X[i])-beta))**2 for i in range (num_instance))  def df_w(X,Y,Z,W,alpha,beta,gamma):#df/sigma * sigma/dw     return np.sum(np.sum((-3)*(Y[i][t]**2-(-np.log(1+np.exp(-np.dot(alpha,X[i])-beta)))*(2*Y[i][t]-1))*(1/(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))**4)*(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))*(1-(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t]))))*X[i]+(1/(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))**2)*(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))*(1-(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t]))))*X[i]for t in range(num_labeler)) for i in range (num_instance))  def df_gamma(X,Y,Z,W,alpha,beta,gamma):#df/sigma * sigma/dgamma     return np.sum(np.sum((-3)*(Y[i][t]**2-(-np.log(1+np.exp(-np.dot(alpha,X[i])-beta)))*(2*Y[i][t]-1))*(1/(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))**4)*(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))*(1-(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t]))))+(1/(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))**2)*(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t])))*(1-(1/(1+np.exp(-np.dot(X[i],W[t])-gamma[t]))))for t in range(num_labeler)) for i in range (num_instance))  def func(para, *args):     alpha, beta, gamma, W = get_params(para)     #args     X = args [0]     Y = args[1]     Z = args[2]             return  np.sum(np.sum(log_p_y_xz(Y[i][t], Z[i], sigma_eta_ti(X[i],W[t],gamma[t]))+log_p_z_x(alpha, beta, X[i]) for t in range(num_labeler)) for i in range (num_instance))     #tested  def func_grad(para, *args):     alpha, beta, gamma, W = get_params(para)     #args     X = args [0]     Y = args[1]     Z = args[2]     #gradiants     d_f_a = df_alpha(X,Y,Z,W,alpha,beta,gamma)     d_f_b = df_beta(X,Y,Z,W,alpha,beta,gamma)     d_f_w = df_w(X,Y,Z,W,alpha,beta,gamma)     d_f_g = df_gamma(X,Y,Z,W,alpha,beta,gamma)     return np.array([d_f_a, d_f_b,d_f_w,d_f_g])  x0 = np.concatenate([np.ravel(alpha), np.ravel(beta), np.ravel(W), np.ravel(gamma)])  optimLogitLBFGS = sp.optimize.fmin_l_bfgs_b(func, x0 = x0, args = (X,Y,Z), fprime = func_grad)   

I am not sure what is the problem. Maybe, the func_grad cause the problem? Could anyone have a look? thanks

回答1:

You need to be taking the derivative of func with respect to each of the elements in your concatenated array of alpha, beta, w, gamma parameters, so func_grad ought to return a single 1D array of the same length as x0 (i.e. 22). Instead it returns a jumble of two arrays and two scalar floats nested inside an np.object array:

In [1]: func_grad(x0, X, Y, Z) Out[1]:  array([array([ 0.00681272,  0.00681272,  0.00681272,  0.00681272]),        0.006684719133999417,        array([-0.01351227, -0.01351227, -0.01351227, -0.01351227]),        -0.013639910534587798], dtype=object) 

Part of the problem is that np.array([d_f_a, d_f_b,d_f_w,d_f_g]) is not concatenating those objects into a single 1D array since some are numpy arrays and some are Python floats. That part is easily solved by using np.hstack([d_f_a, d_f_b,d_f_w,d_f_g]) instead.

However, the combined sizes of these objects is still only 10, whereas the output of func_grad needs to be a 22-long vector. You will need to take another look at your df_* functions. In particular, W is a (3, 4) array, but df_w only returns a (4,) vector, and gamma is a (4,) vector whereas df_gamma only returns a scalar.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!