Soptq

Soptq

Probably a full-stack, mainly focusing on Distributed System / Consensus / Privacy-preserving Tech etc. Decentralization is a trend, privacy must be protected.
twitter
github
bilibili

反向传播算法的数学推导

本文严重参考了 CSDN 反向传播算法(过程及公式推导)

基本定义#

An example of a simple neural network

在上图所示的简单神经网络中,layer 1 是输入层,layer 2 是隐藏层,layer 3是输出层。我们用上图来阐述一些变量名称的意义:

名称含义
bilb_{i}^{l}ll 层的第 ii 个神经元的偏置
wjilw_{ji}^{l}l1l-1 层的第 ii 个神经元连接第 ll 层的第 jj 个神经元
zilz_{i}^{l}ll 层的第 ii 个神经元的输入
aila_{i}^{l}ll 层的第 ii 个神经元的输出
σ\sigma激活函数

通过上面的定义,我们可以知道:

zjl=iwjilail1+bjlz_{j}^{l} = \sum_{i}w_{ji}^{l}a_{i}^{l-1} + b_{j}^{l}

ajl=σzjl=σ(iwjilail1+bjl)a_{j}^{l} = \sigma z_{j}^{l} = \sigma \left( \sum_{i}w_{ji}^{l}a_{i}^{l-1} + b_{j}^{l} \right)

我们令损失函数为二次代价函数 (Quadratic Cost Function) :

J=12nxy(x)aL(x)2J = \frac{1}{2n} \sum_{x} \lvert \lvert y(x) - a^{L}(x) \rvert \rvert ^ {2}

其中,xx 表示输入样本,y(x)y(x) 表示实际分类,aL(x)a^{L}(x) 表示预测分类,LL 表示网络的最大层数。当只有一个输入样本时,损失函数 JJ 标示为:

J=12xy(x)aL(x)2J = \frac{1}{2} \sum_{x} \lvert \lvert y(x) - a^{L}(x) \rvert \rvert ^ {2}

最后我们将第 ll 层第 ii 个神经元中产生的错误定义为:

δilJzil\delta_{i}^{l} \equiv \frac{\partial{J}}{\partial{z_{i}^{l}}}

公式推导#

损失函数对最后一层神经网络产生的错误为:

δiL=JziL=JaiLaiLziL=J(aiL)σ(ziL)\begin{aligned}\delta_{i}^{L} &= \frac{\partial{J}}{\partial{z_{i}^{L}}}\\&=\frac{\partial{J}}{\partial{a_{i}^{L}}} \cdot \frac{\partial{a_{i}^{L}}}{\partial{z_{i}^{L}}}\\&=\nabla J(a_{i}^{L}) \sigma^{'}(z_{i}^{L})\end{aligned}

δL=J(aL)σ(zL)\delta^{L} = \nabla J(a^{L}) \odot \sigma^{'}(z^{L})

损失函数对第 jj 层网络产生的错误为:

δjl=Jzjl=Jajlajlzjl=iJzil+1zil+1ajlajlzjl=iδil+1wijl+1ajl+bil+1ajlσ(zjl)=iδil+1wijl+1σ(zjl)\begin{aligned}\delta_{j}^{l} &= \frac{\partial{J}}{\partial{z_{j}^{l}}} \\ &= \frac{\partial{J}}{\partial{a_{j}^{l}}} \cdot \frac{\partial{a_{j}^{l}}}{\partial{z_{j}^{l}}} \\ &= \sum_{i} \frac{\partial{J}}{\partial{z_{i}^{l+1}}} \cdot \frac{\partial{z_{i}^{l+1}}}{\partial{a_{j}^{l}}} \cdot \frac{\partial{a_{j}^{l}}}{\partial{z_{j}^{l}}} \\ &= \sum_{i} \delta_{i}^{l+1} \cdot \frac{\partial{w_{ij}^{l+1}a_{j}^{l} + b_{i}^{l+1}}}{\partial{a_{j}^{l}}} \cdot \sigma^{'}(z_{j}^{l}) \\ &=\sum_{i} \delta_{i}^{l+1} \cdot w_{ij}^{l+1} \cdot \sigma^{'}(z_{j}^{l}) \end{aligned}

δl=((wl+1)Tδl+1)σ(zl)\delta^{l} = \left( \left( w^{l+1} \right)^{T} \delta^{l+1} \right) \odot \sigma^{'}(z^{l})

则通过损失函数我们可以计算权重的梯度为:

Jwjil=Jzjlzjlwjil=δjl(wjilail1+bjl)wjil=δjlail1\begin{aligned} \frac{\partial{J}}{\partial{w_{ji}^{l}}} &= \frac{\partial{J}}{\partial{z_{j}^{l}}} \cdot \frac{\partial{z_{j}^{l}}}{\partial{w_{ji}^{l}}} \\ &= \delta_{j}^{l} \cdot \frac{\partial{\left( w_{ji}^{l}a_{i}^{l-1} + b_{j}^{l} \right)}}{\partial{w_{ji}^{l}}} \\ &= \delta_{j}^{l} \cdot a_{i}^{l-1} \end{aligned}

Jwjil=δjlail1\frac{\partial{J}}{\partial{w_{ji}^{l}}} = \delta_{j}^{l} \cdot a_{i}^{l-1}

最后,通过损失函数计算偏执的梯度为:

Jbjl=Jzjlzjlbjl=δjlwjilail1+bjlbjl=δjl\begin{aligned} \frac{\partial{J}}{\partial{b_{j}^{l}}} &= \frac{\partial{J}}{\partial{z_{j}^{l}}} \cdot \frac{\partial{z_{j}^{l}}}{\partial{b_{j}^{l}}} \\ &= \delta_{j}^{l} \cdot \frac{\partial{w_{ji}^{l} a_{i}^{l-1} + b_{j}^{l}}}{\partial{b_{j}^{l}}} \\ &=\delta_{j}^{l} \end{aligned}

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.