BP算法是一种最有效的多层神经网络学习办法,其主要特色是信号前向传递,而误差后向传布,通过不断调理网络权重值,使得网络的终究输出与指望输出尽量接近,以到达训练的目的。
一、多层神经网络结构及其描写文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
下图为一典型的多层神经网络。文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
通常一个多层神经网络由L层神经元组成,其中:第1层称为输入层,最后一层的目的。文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
采取批量更新办法,对于给定的m个训练样本,定义误差函数为:文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
E=1m∑i=1mE(i)E=1m∑i=1mE(i)文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
其中,E(i)为单个样本的训练误差:文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
E(i)=12∑k=1n(dk(i)−yk(i))2E(i)=12∑k=1n(dk(i)−yk(i))2文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
因而,文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
E=12m∑i=1m∑k=1n(dk(i)−yk(i))2E=12m∑i=1m∑k=1n(dk(i)−yk(i))2文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
BP算法每一一次迭代依照下列方式对权值和偏置进行更新:文章源自微观生活(93wg.com)微观生活-https://93wg.com/12724.html
W(l)ij=W(l)ij−α∂E∂W(l)ijWij(l)=Wij(l)−α∂E∂Wij(l)
b(l)i=b(l)i−α∂E∂b(l)ibi(l)=bi(l)−α∂E∂bi(l)
其中,αα为学习速率,它的取值规模为(0, 1)。BP算法的关键在于怎么求解W(l)ijWij(l)以及b(l)ibi(l)的偏导数。
对于单个训练样本,输出层的权值偏导数计算进程:
∂E(i)∂W(L)kj=∂∂W(L)kj(12∑k=1n(dk(i)−yk(i))2)=∂∂W(L)kj(12(dk(i)−yk(i))2)=−(dk(i)−yk(i))∂yk(i)∂W(L)kj=−(dk(i)−yk(i))∂yk(i)∂net(L)k∂net(L)k∂W(L)kj=−(dk(i)−yk(i))f(x)′|x=net(L)k∂net(L)k∂W(L)kj=−(dk(i)−yk(i))f(x)′|x=net(L)kh(L−1)j∂E(i)∂Wkj(L)=∂∂Wkj(L)(12∑k=1n(dk(i)−yk(i))2)=∂∂Wkj(L)(12(dk(i)−yk(i))2)=−(dk(i)−yk(i))∂yk(i)∂Wkj(L)=−(dk(i)−yk(i))∂yk(i)∂netk(L)∂netk(L)∂Wkj(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)∂netk(L)∂Wkj(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)hj(L−1)
即:
∂E(i)∂W(L)kj=−(dk(i)−yk(i))f(x)′|x=net(L)kh(L−1)j∂E(i)∂Wkj(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)hj(L−1)
同理可得,
∂E(i)∂b(L)k=−(dk(i)−yk(i))f(x)′|x=net(L)k∂E(i)∂bk(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)
令:
δ(L)k=−(dk(i)−yk(i))f(x)′|x=net(L)kδk(L)=−(dk(i)−yk(i))f(x)′|x=netk(L)
则:
∂E(i)∂W(L)kj=δ(L)kh(L)j∂E(i)∂Wkj(L)=δk(L)hj(L)
∂E(i)∂b(L)k=δ(L)k∂E(i)∂bk(L)=δk(L)
对隐含层L-1层:
∂E(i)∂W(L−1)ji=∂∂W(L−1)ji(12∑k=1n(dk(i)−yk(i))2)=∂∂W(L−1)ji(12∑k=1n(dk(i)−f(∑j=1sL−1W(L)kjh(L−1)j+b(L)k))2)=∂∂W(L−1)ji(12∑k=1n(dk(i)−f(∑j=1sL−1W(L)kjf(∑i=1sL−2W(L−2)jih(L−2)i+b(L−1)j)+b(L)k))2)=−∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)k∂net(L)k∂W(L−1)ji∂E(i)∂Wji(L−1)=∂∂Wji(L−1)(12∑k=1n(dk(i)−yk(i))2)=∂∂Wji(L−1)(12∑k=1n(dk(i)−f(∑j=1sL−1Wkj(L)hj(L−1)+bk(L)))2)=∂∂Wji(L−1)(12∑k=1n(dk(i)−f(∑j=1sL−1Wkj(L)f(∑i=1sL−2Wji(L−2)hi(L−2)+bj(L−1))+bk(L)))2)=−∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)∂netk(L)∂Wji(L−1)
由于,
net(L)k=∑j=1sL−1W(L)kjh(L−1)j+b(L)k=∑j=1sL−1W(L)kjf(∑i=1sL−2W(L−2)jih(L−2)i+b(L−1)j)+b(L)k=∑j=1sL−1W(L)kjf(net(L−1)j)netk(L)=∑j=1sL−1Wkj(L)hj(L−1)+bk(L)=∑j=1sL−1Wkj(L)f(∑i=1sL−2Wji(L−2)hi(L−2)+bj(L−1))+bk(L)=∑j=1sL−1Wkj(L)f(netj(L−1))
所以,
∂E(i)∂W(L−1)ji=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)k∂net(L)k∂W(L−1)ji=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)k∂net(L)k∂f(net(L−1)j)∂f(net(L−1)j)∂net(L−1)j∂net(L−1)j∂W(L−1)ji=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)kW(L)kjf(x)′|x=net(L−1)jh(L−2)i∂E(i)∂Wji(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)∂netk(L)∂Wji(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)∂netk(L)∂f(netj(L−1))∂f(netj(L−1))∂netj(L−1)∂netj(L−1)∂Wji(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)Wkj(L)f(x)′|x=netj(L−1)hi(L−2)
同理,
∂E(i)∂b(L−1)j=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)kW(L)kjf(x)′|x=net(L−1)j∂E(i)∂bj(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)Wkj(L)f(x)′|x=netj(L−1)
令:
δ(L−1)j=∑k=1n(dk(i)−yk(i))f(x)′|x=net(L)kW(L)kjf(x)′|x=net(L−1)j=∑k=1nW(L)kjδ(L)kf(x)′|x=net(L−1)jδj(L−1)=∑k=1n(dk(i)−yk(i))f(x)′|x=netk(L)Wkj(L)f(x)′|x=netj(L−1)=∑k=1nWkj(L)δk(L)f(x)′|x=netj(L−1)
∂E(i)∂W(L−1)ji=δ(L−1)jh(L−2)i∂E(i)∂Wji(L−1)=δj(L−1)hi(L−2)
∂E(i)∂b(L−1)j=δ(L−1)j∂E(i)∂bj(L−1)=δj(L−1)
由上可推,第l层关于“BP神经网络推导进程”的详细内容,希望对大家有所帮助!
评论