Deep residual Learning for Image Recognition
- Published Year : 2015
- Paper URL
What
Deeper neural networks are more diffcult to train. The authors persent a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
Why
- From the picture avove we can find that the deeper layers cause higher training error. - Suppose we have a shallower architecture and we can add layers on it to make a deeper architecture. We can hypothesis that the deeper model should produce no higher training error than than its shallower counterpart since the deeper layers should contain as same or more informations as shallower layers. - But the experiment show that it does worse performance since Vanishing Gradient, which gradient signals from the error function decreased exponentially as they backprop to earlier layers.
How
- Consider \(\cal{H}(x)\) as an underlying mapping to be fit by a few stacked layers with \(x\) denoting the inputs.
- We can hypothesizes that multiple nonlinear layers can approximate complicated functions, then it is equivalent to hypothesis that thay can approximate the residual functions, \(i.e.,\cal{H}(x)-x\)(assumingthat the input and toutput are of the same dimensions.)
- Rather than approximate \(\cal{H}(x)\), we let these layers approx. a residual function \(\cal{F}(x):=\cal{H}(x)-x\). THhe original function thus becomes \(\cal{F}(x)+x\).
- This gives rise to the famous ResNet block we've probably seen:
- Network architecture:
- This module makes very deep neural network, such as 50, 101 and 152 layers, stll work well.
- Result:
And?
- This idea works extremely well in practice. There is still many computer vision use the concept of ResNet util today.
- I think this Network architecture is beautiful since it use relatively simple modules to solve the problem that which deep network didn't perform well.