Deep residual Learning for Image Recognition

Published Year : 2015
Paper URL

What

Deeper neural networks are more diffcult to train. The authors persent a residual learning framework to ease the training of networks that are substantially deeper than those used previously.

Why

image alt - From the picture avove we can find that the deeper layers cause higher training error. - Suppose we have a shallower architecture and we can add layers on it to make a deeper architecture. We can hypothesis that the deeper model should produce no higher training error than than its shallower counterpart since the deeper layers should contain as same or more informations as shallower layers. - But the experiment show that it does worse performance since Vanishing Gradient, which gradient signals from the error function decreased exponentially as they backprop to earlier layers.

How

Consider \(\cal{H}(x)\) as an underlying mapping to be fit by a few stacked layers with \(x\) denoting the inputs.
We can hypothesizes that multiple nonlinear layers can approximate complicated functions, then it is equivalent to hypothesis that thay can approximate the residual functions, \(i.e.,\cal{H}(x)-x\)(assumingthat the input and toutput are of the same dimensions.)
Rather than approximate \(\cal{H}(x)\), we let these layers approx. a residual function \(\cal{F}(x):=\cal{H}(x)-x\). THhe original function thus becomes \(\cal{F}(x)+x\).
This gives rise to the famous ResNet block we've probably seen:
Network architecture:
This module makes very deep neural network, such as 50, 101 and 152 layers, stll work well.
Result:

And?

This idea works extremely well in practice. There is still many computer vision use the concept of ResNet util today.
I think this Network architecture is beautiful since it use relatively simple modules to solve the problem that which deep network didn't perform well.

ResNet Paper Note

KJL 發佈於 2022-08-07

Deep residual Learning for Image Recognition

What

Why

How

And?

KJL