BP神经网络推导进程

小微 科技BP神经网络推导进程已关闭评论121字数 3003阅读模式
摘要BP算法是一种最有效的多层神经网络学习方法,其主要特点是信号前向传递,而误差后向传播,通过不断调节网络权重值,使得网络的最终输出与期望输出尽可能接近,以达到训练的目的。一、多层神经...

BP算法是一种最有效的多层神经网络学习办法,其主要特色是信号前向传递,而误差后向传布,通过不断调理网络权重值,使得网络的终究输出与指望输出尽量接近,以到达训练的目的。

一、多层神经网络结构及其描写文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

下图为一典型的多层神经网络。文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

通常一个多层神经网络由L层神经元组成,其中:第1层称为输入层,最后一层的目的。文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

采取批量更新办法,对于给定的m个训练样本,定义误差函数为:文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

E=1m∑i=1mE(i)E=1m∑i=1mE(i)文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

其中,E(i)为单个样本的训练误差:文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

E(i)=12∑k=1n(dk(i)−yk(i))2E(i)=12∑k=1n(dk(i)−yk(i))2文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

因而,文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

E=12m∑i=1m∑k=1n(dk(i)−yk(i))2E=12m∑i=1m∑k=1n(dk(i)−yk(i))2文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

BP算法每一一次迭代依照下列方式对权值和偏置进行更新:文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html

W(l)ij=W(l)ij−α∂E∂W(l)ijWij(l)=Wij(l)−α∂E∂Wij(l)

b(l)i=b(l)i−α∂E∂b(l)ibi(l)=bi(l)−α∂E∂bi(l)

其中,αα为学习速率,它的取值规模为(0, 1)。BP算法的关键在于怎么求解W(l)ijWij(l)以及b(l)ibi(l)的偏导数。

对于单个训练样本,输出层的权值偏导数计算进程:

∂E(i)∂W(L)kj=∂∂W(L)kj(12∑k=1n(dk(i)−yk(i))2)=∂∂W(L)kj(12(dk(i)−yk(i))2)=−(dk(i)−yk(i))∂yk(i)∂W(L)kj=−(dk(i)−yk(i))∂yk(i)∂net(L)k∂net(L)k∂W(L)kj=−(dk(i)−yk(i))f(x)′|x=net(L)k∂net(L)k∂W(L)kj=−(dk(i)−yk(i))f(x)′|x=net(L)kh(L−1)j∂E(i)∂Wkj(L)=∂∂Wkj(L)(12∑k=1n(dk(i)−yk(i))2)=∂∂Wkj(L)(12(dk(i)−yk(i))2)=−(dk(i)−yk(i))∂yk(i)∂Wkj(L)=−(dk(i)−yk(i))∂yk(i)∂netk(L)∂netk(L)∂Wkj(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)∂netk(L)∂Wkj(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)hj(L−1)

即:

∂E(i)∂W(L)kj=−(dk(i)−yk(i))f(x)′|x=net(L)kh(L−1)j∂E(i)∂Wkj(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)hj(L−1)

同理可得,

∂E(i)∂b(L)k=−(dk(i)−yk(i))f(x)′|x=net(L)k∂E(i)∂bk(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)

令:

δ(L)k=−(dk(i)−yk(i))f(x)′|x=net(L)kδk(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)

则:

∂E(i)∂W(L)kj=δ(L)kh(L)j∂E(i)∂Wkj(L)=δk(L)hj(L)

∂E(i)∂b(L)k=δ(L)k∂E(i)∂bk(L)=δk(L)

对隐含层L-1层:

∂E(i)∂W(L−1)ji=∂∂W(L−1)ji(12∑k=1n(dk(i)−yk(i))2)=∂∂W(L−1)ji(12∑k=1n(dk(i)−f(∑j=1sL−1W(L)kjh(L−1)j+b(L)k))2)=∂∂W(L−1)ji(12∑k=1n(dk(i)−f(∑j=1sL−1W(L)kjf(∑i=1sL−2W(L−2)jih(L−2)i+b(L−1)j)+b(L)k))2)=−∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)k∂net(L)k∂W(L−1)ji∂E(i)∂Wji(L−1)=∂∂Wji(L−1)(12∑k=1n(dk(i)−yk(i))2)=∂∂Wji(L−1)(12∑k=1n(dk(i)−f(∑j=1sL−1Wkj(L)hj(L−1)+bk(L)))2)=∂∂Wji(L−1)(12∑k=1n(dk(i)−f(∑j=1sL−1Wkj(L)f(∑i=1sL−2Wji(L−2)hi(L−2)+bj(L−1))+bk(L)))2)=−∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)∂netk(L)∂Wji(L−1)

由于,

net(L)k=∑j=1sL−1W(L)kjh(L−1)j+b(L)k=∑j=1sL−1W(L)kjf(∑i=1sL−2W(L−2)jih(L−2)i+b(L−1)j)+b(L)k=∑j=1sL−1W(L)kjf(net(L−1)j)netk(L)=∑j=1sL−1Wkj(L)hj(L−1)+bk(L)=∑j=1sL−1Wkj(L)f(∑i=1sL−2Wji(L−2)hi(L−2)+bj(L−1))+bk(L)=∑j=1sL−1Wkj(L)f(netj(L−1))

所以,

∂E(i)∂W(L−1)ji=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)k∂net(L)k∂W(L−1)ji=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)k∂net(L)k∂f(net(L−1)j)∂f(net(L−1)j)∂net(L−1)j∂net(L−1)j∂W(L−1)ji=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)kW(L)kjf(x)′|x=net(L−1)jh(L−2)i∂E(i)∂Wji(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)∂netk(L)∂Wji(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)∂netk(L)∂f(netj(L−1))∂f(netj(L−1))∂netj(L−1)∂netj(L−1)∂Wji(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)Wkj(L)f(x)′|x=netj(L−1)hi(L−2)

同理,

∂E(i)∂b(L−1)j=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)kW(L)kjf(x)′|x=net(L−1)j∂E(i)∂bj(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)Wkj(L)f(x)′|x=netj(L−1)

令:

δ(L−1)j=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)kW(L)kjf(x)′|x=net(L−1)j=∑k=1nW(L)kjδ(L)kf(x)′|x=net(L−1)jδj(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)Wkj(L)f(x)′|x=netj(L−1)=∑k=1nWkj(L)δk(L)f(x)′|x=netj(L−1)

∂E(i)∂W(L−1)ji=δ(L−1)jh(L−2)i∂E(i)∂Wji(L−1)=δj(L−1)hi(L−2)

∂E(i)∂b(L−1)j=δ(L−1)j∂E(i)∂bj(L−1)=δj(L−1)

由上可推,第l层关于“BP神经网络推导进程”的详细内容,希望对大家有所帮助!

继续阅读
 
小微
  • 版权声明: 本文部分文字与图片资源来自于网络,转载此文是出于传递更多信息之目的,若有来源标注错误或侵犯了您的合法权益,请立即通知我们(管理员邮箱:81118366@qq.com),情况属实,我们会第一时间予以删除,并同时向您表示歉意,谢谢!
  • 转载请务必保留本文链接:https://93wg.com/12724.html