Going deeper with convolutions (Inception, GoogLeNet)

Published Year : 2014 Paper URL image alt

What

This architecture improved utilization of the computing resources inside the network which is achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant.

Why

The most straightforward way of improving the performance of deep neural networks is by increasing their depth and width, but this way comes two drawbacks:
1. Overfitting
2. High Computaional Complexity
Although the way of solving both problems is by ultimately moving from fully connected to sparsely connected architecture, but todays computing infrastructures are very inefficient when it comes to numerical calculation on non-uniform sparse data structures. ## How
In order avoid patch alignment issues, current incarnations of the Inception architectue are restricted to the filter sizes 1x1, 3x3 and 5x5.
The problem with the above architecture is that even a modest number of 5x5 convolutions can be expensive on top of a convolutional layer with a large number of filters.
To make computational complexity cheaper, adding the 1x1 conv. layers as role of reduction which reduced the number of input channels.
The network is 22 layers(27 layers with pooling).
Adding auxiliary classifiers connected to these nitermediate layers to ecourage discrimination in the lower stages in the classifier, increse the gradient signal that gets propagated back, and provide additional regularization.
The final result of the architecture reach to 6.67% in the ILSVRC 2014 Classification Challenge.