矩阵微分

混江龙づ霸主 提交于 2019-12-10 07:07:07


变元:在初等数学里,变量或变元、元是一个用来表示值的符号,该值可以是随意的,也可能是未指定或未定的。

vec()vec(\cdot)表示矩阵化为列向量,rvec()rvec(\cdot)表示矩阵化为行向量,vec()vec(\cdot)称为向量化算子,vec()vec(\cdot)又分为按行展开和按列展开,没有特殊说明的向量都为列向量。unvec()unvec(\cdot)表示列向量化为矩阵,unrvec()unrvec(\cdot)表示行向量化为矩阵

设矩阵A=(aij)Rm×nA=(a_{ij})\in R_{m\times n},把矩阵AA的元素按行的顺序排列成一个列向量:
vecA=(a11,a12,,a1n,a21,a22,,a2n,,am1,am2,,amn)TvecA=(a_{11},a_{12},\cdots,a_{1n},a_{21},a_{22},\cdots,a_{2n},\cdots,a_{m1},a_{m2},\cdots,a_{mn})^T,则称向量vecAvecA为矩阵AA按行展开的列向量。

设矩阵A=(aij)Rm×nA=(a_{ij})\in R_{m\times n},把矩阵AA的元素按列的顺序排列成一个列向量:
vecA=(a11,a21,,an1,a12,a22,,an2,,a1n,a2n,,amn)TvecA=(a_{11},a_{21},\cdots,a_{n1},a_{12},a_{22},\cdots,a_{n2},\cdots,a_{1n},a_{2n},\cdots,a_{mn})^T,则称向量vecAvecA为矩阵AA按列展开的列向量。

向量对向量求偏导,才涉及到分子、分母布局。
分子布局(称为JacobianJacobian形式)。比如ym×1y_{m\times 1}xn×1x_{n\times 1},则JacobianJacobian形式为yxT\frac {\partial y}{\partial x^T}即按照yyxTx^T的维数相乘为m×1×1×n=m×nm \times 1 \times 1 \times n = m\times n,因为分子没有变化(转置),所以称为分子布局(个人记法)。
分母布局(称为HessianHessian形式)。比如ym×1y_{m\times 1}xn×1x_{n\times 1},则HessianHessian形式为yTx\frac {\partial y^T}{\partial x}即按照xxyTy^T的维数相乘为n×1×1×m=n×mn \times 1 \times 1 \times m = n\times m,因为分母没有变化(转置),所以称为分母布局(个人记法)。有的也称该布局为梯度,区别于JacobianJacobian形式。


协梯度矩阵


ScalarVector/Matrix\frac {\partial Scalar}{\partial Vector / Matrix}

1×m  1 \times m \;行向量的偏导算子 Dx=defxT=Δ[x1,,xm]D_x \overset {def}{=}\frac {\partial}{\partial x^T}\overset {\Delta }{=}\left [ \frac{\partial }{\partial x_1} , \cdots , \frac {\partial}{\partial x_m} \right ]
f(x)标量函数f(x)的行向量偏导 Dxf(x)=f(x)xT=[f(x)x1,,f(x)xm]D_x f(x)=\frac{\partial f(x)}{\partial x^T}=\left [ \frac {\partial f(x)}{\partial x_1},\cdots,\frac{\partial f(x)}{\partial x_m} \right ]
f(X)矩阵形式定义实值标量函数f(X)的行向量偏导 DvecT(X)f(X)=f(X)vecT(X)=[f(X)X11,,f(X)Xm1,,f(X)X1n,,f(X)Xmn]D_{vec^T(X)}f(X)=\frac{\partial f(X)}{\partial vec^T(X)} \\ =\left [ \frac {\partial f(X)}{\partial X_{11}}, \cdots ,\frac {\partial f(X)}{\partial X_{m1}}, \cdots,\frac {\partial f(X)}{\partial X_{1n}}, \cdots , \frac {\partial f(X)}{\partial X_{mn}} \right ]
f(X)X矩阵形式定义实值标量函数f(X)在X处的偏导 DXf(X)=[f(X)X11f(X)Xm1f(X)X1nf(X)Xmn]Rn×mD_Xf(X)=\left [ \begin{matrix}\frac{\partial f(X)}{\partial X_{11}} & \cdots &\frac{\partial f(X)}{\partial X_{m1}}\\ \vdots & \ddots & \vdots \\ \frac{\partial f(X)}{\partial X_{1n}} & \cdots &\frac{\partial f(X)}{\partial X_{mn}}\\ \end{matrix} \right ] \in R^{n\times m} DXf(X)=f(X)XT=[f(X)Xji]j=1,i=1m,nD_Xf(X)=\frac {\partial f(X)}{\partial X^T}=\left [ \frac{\partial f(X)}{\partial X_{ji}}\right ] _{j=1,i=1}^{m,n}
备注 vecT(X)=[vec(X)]Tvec^T(X)=\left [ vec(X) \right ]^T DvecT(X)f(X)DXf(X)f(X)D_{vec^T(X)}f(X)和D_Xf(X)分别为矩阵形式定义实值标量函数f(X) XJacobian在X的行向量偏导和Jacobian矩阵

梯度矩阵


ScalarVector/Matrix\frac {\partial Scalar}{\partial Vector / Matrix}

m×1  m \times 1 \;列向量偏导算子 习惯称之为梯度算子 x=defx=[x1,,xm]T\nabla_x \overset {def}{=}\frac {\partial }{\partial x}= \left [ \frac {\partial }{\partial x_1}, \cdots , \frac {\partial }{\partial x_m} \right ] ^T
f(x)标量函数f(x)的列向量偏导 f(x)习惯称之为标量函数f(x)的梯度矩阵 xf(x)=deff(x)x=[f(x)x1,,f(x)xm]T\nabla_xf(x) \overset {def}{=} \frac {\partial f(x)}{\partial x}=\left [ \frac {\partial f(x)}{\partial x_1}, \cdots , \frac {\partial f(x)}{\partial x_m} \right ]^T
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!