Inception v2 & v3 Paper Note

KJL 發佈於 2022-08-03

🏷️ paper_notes

Rethinking the Inception Arcitecture for Computer Vision (Inception v2 & v3)

Published Year : 2015 Paper URL

What

Inception v2 and v3 improved computational efficiency and model accuracy to 3.5% top-5 error and 17.3% top-1 error in ILSVRC 2012 classification challenge. ## Why
The problem with inception v1:
- hard to adapt it to new use-cases while maintaining its efficiency.
Two ideas to improve performance
- Avoiding representaional bottlenecks.
- Using smart factorization methods, convolutions can be more efficient in terms of computational complexity. ## How ### Inception V1
Here is inception v1 module.

Inception V2

Factorization
- Factorization into smaller convolution
  - From the picture below we can find that one 5x5 conv. layer can be replaced by two 3x3 layers.
  - Below is Inception v2 module where 5x5 layer is replaced by two 3x3 layers contrast to Figure 4 above.
- Spatial Factorization into Asymmetric Convolutions
  - Using asymmetric convolutions can be 33% cheaper than the symmetric conv. Ex: Using a 3x1 conv. followed by a 1x3 conv. is equivalent to sliding a two layer network with the same receptive field as in a 3x3 conv. like the picture below.

- The overall v2 architecture:

Inception V3

Inception v3 Net incorpoeated all of the above upgrade stated for Inception v2, and in addition used the following:
- RMSProp Optimizer
- Factorized 7x7 conv.
- Batch Norm in the Auxiliary classifier
  - The authors noted that the auxiliary classifier didn't contribute much until near the end of the training process, when accuracies wear near saturation. Thay argued that they function as regularizes, especially with dropout or BN operations.
- Label Smoothing Regularization
  - This regularization prevent the model from being too confident about a class. Prevents overfitting.
  - Before regularizing process, all grond-truth label that's called \(q(k|x)\) is one-hot, which means it is either zero or one. This caused to push up the probability of answer but push down the wrong. Intuitively, this happens because the model becomes too confident about its predictions.
  - Thus the authors proposed a new ground-truth distribution \(q'(k|x)=(1-\epsilon)\delta_{k,y}+\epsilon u(k)\) where \((k==y ? \delta_{k,y}=1:\delta_{k,y}=0)\) and since there are 1000 label in ILSVRC, \(u(k)=1/1000\) and \(\epsilon=0.1\) in this model.

And?

There is v4 version of inception module, which is connected with residual module. i would like to review it latter.
This paper introduced some convolution factorization methods. That's pretty cool. I didn't know this in other online ML course until now.
I found that my reading skill is pretty bad. Reading this paper cost me a lot of time. Hope that doing more practice will be better.