ML:逻辑回归的梯度下降算法

末鹿安然 提交于 2019-11-30 09:42:13

ML:极大似然估计

概率密度(质量)函数:用来描述随机变量取某个值的时候,取值点对应的概率的函数。

概率:已知概率分布,推断样本的概率值

似然:已经有观测样本,寻找最符合当前数据分布的参数

似然函数:L(μ,σX)=i=1NP(xiμ,σ)\mathcal{L}(\mu, \sigma | X)=\prod_{i=1}^{N} P\left(x_{i} | \mu, \sigma\right)

对数似然函数:L(μ,σX)=i=1NlogP(xiμ,σ)\mathcal{L}(\mu, \sigma | X)=\sum_{i=1}^{N} \log P\left(x_{i} | \mu, \sigma\right)

损失函数:J(θ)=imYlog(Y^)(1Y)log(1Y^)J(\theta)=-\sum_{i}^{m} Y \log (\hat{Y})-(1-Y) \log (1-\hat{Y}),需要求J(θ)J(\theta)对于θi\theta_{i}的导数,式中Y^=11+eθTX\hat{Y}=\frac{1}{1+e^{-\theta^{T} X}}

利用ddxloga(f(x))=1f(x)lnaf(x)\frac{d}{d x} \log _{a}(f(x))=\frac{1}{f(x) \ln a} f^{\prime}(x),将Y^=11+eθTX\hat{Y}=\frac{1}{1+e^{-\theta^{T} X}}代入log(Y^)log(\hat{Y}):

θjlog(Y^)=θjlog(11+eθTx)=θj(log(1)log(1+eθTx))\frac{\partial}{\partial \theta_{j}} \log (\hat{Y})=\frac{\partial}{\partial \theta_{j}} \log \left(\frac{1}{1+e^{-\theta^{T} x}}\right)=\frac{\partial}{\partial \theta_{j}}(\log (1)-\log \left(1+e^{-\theta^{T} x}\right))

θjlog(Y^)=θj(log(1+eθTx))=11+eθTxeθTxxj=(111+eθTx)x\frac{\partial}{\partial \theta_{j}} \log (\hat{Y})=\frac{\partial}{\partial \theta_{j}}(-\log \left(1+e^{-\theta^{T} x}\right))=-\frac{1}{1+e^{-\theta^{T} x}} \cdot e^{-\theta^{T} x} \cdot-x_{j}=\left(1-\frac{1}{1+e^{-\theta^{T} x}}\right) x

θjlog(1Y^)=θjlog(eθTx1+eθTx)=θj(θTxlog(1+eθTx))\frac{\partial}{\partial \theta_{j}} \log (1-\hat{Y})=\frac{\partial}{\partial \theta_{j}} \log \left(\frac{e^{-\theta^{T} x}}{1+e^{-\theta^{T} x}}\right)=\frac{\partial}{\partial \theta_{j}}(-\theta^{T} x-\log \left(1+e^{-\theta^{T} x}\right))

θjlog(1Y^)=xj+xj(111+eθTx)=11+eθTxxj\frac{\partial}{\partial \theta_{j}} \log (1-\hat{Y})=-x_{j}+x_{j}\left(1-\frac{1}{1+e^{-\theta^{T} x}}\right)=-\frac{1}{1+e^{-\theta^{T} x}} x_{j}

综上可得,θjJ(θ)=imyixij(111+eθTxi)(1yi)xij11+eθTxi\frac{\partial}{\partial \theta_{j}} J(\theta)=-\sum_{i}^{m} y_{i} x_{i j}\left(1-\frac{1}{1+e^{-\theta^{T} x_{i}}}\right)-\left(1-y_{i}\right) x_{i j} \frac{1}{1+e^{-\theta^{T} x_{i}}}

其中,ii是数据点的序号,jj是特征的数量,输入XX可以表示为:

X=[xi=1,j=1xi=2,j=1xi=3,j=1xi=1,j=2xi=2,j=2xi=3,j=2xi=1,j=3xi=2,j=3xi=3,j=3]X=\left[\begin{array}{ll}{x_{i=1, j=1}} & {x_{i=2, j=1} x_{i=3, j=1}} \\ {x_{i=1, j=2}} & {x_{i=2, j=2} x_{i=3, j=2}} \\ {x_{i=1, j=3}} & {x_{i=2, j=3} x_{i=3, j=3}}\end{array}\right],举个例子,一个batch的图片,xijx_{i j}表示第ii张图片的第jj个像素

展开整理得:θjJ(θ)=im(11+eθTxiyi)xij=im(y^iyi)xij\frac{\partial}{\partial \theta_{j}} J(\theta)=\sum_{i}^{m}\left(\frac{1}{1+e^{-\theta^{T} x_{i}}}-y_{i}\right) x_{i j}=\sum_{i}^{m}\left(\hat{y}_{i}-y_{i}\right) x_{i j},式中Y^=11+eθTX\hat{Y}=\frac{1}{1+e^{-\theta^{T} X}}

之前θ\thetaXX的向量表示形式为:θT=[ bias θ1θ2]\theta^{T}=\left[\begin{array}{lll}{\text { bias }} & {\theta_{1}} & {\theta_{2}}\end{array}\right]X=[1x1x2]X=\left[\begin{array}{c}{1} \\ {x_{1}} \\ {x_{2}}\end{array}\right]

由于θ\theta中的biasbias对应着XX里面的1,所以可以得到:biasJ(θ)=im(y^iyi)\frac{\partial}{\partial b i a s} J(\theta)=\sum_{i}^{m}\left(\hat{y}_{i}-y_{i}\right)

设定学习率η\eta,迭代下面的步骤直至收敛:

θjθjηθjJ(θ)\theta_{j} \leftarrow \theta_{j}-\eta \frac{\partial}{\partial \theta_{j}} J(\theta)

biasbiasη bias J(θ)bias \leftarrow bias -\eta \frac{\partial}{\partial \text { bias }} J(\theta)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!