Blog: Notes on Deep Learning — Advanced CNN
We finally made it to advanced CNN. Yay!
The journey has been long and I am proud of you guys if you followed the series until this point.
Today we talk about Advanced Convolutional Neural Network.
Most of the hard part was covered in basic CNN post before. Today we just build on top of our building blocks of basic
Let’s recap the full picture of CNN.
1. The convolution step
2. The subsampling
(The 1 & 2 can be repeated)
3. The linear layer
(The 3 is also called the fully connected layer)
Then we have our filter of a specified size moving over the image
The advanced CNN says we can use different size filters and concatenate the result of each filter to a single feature map
example: (1 x 1 filter in green) + (3 x 3 filter in blue) + (5 x 5 filter in dark blue)
We keep doing this and recreate inception…
There is a fascinating trick of using 1 x 1 filter. This sounds dumb like why we use a window to get back the image itself… ah ah ah… remember we have different channels? so we just concatenate all channels by having 1 x 1 filter.
Having different filters we can have an inception model as below. Note each branch is created by stacking all the tricks we learned previously
What is next for advancement? Keep going deeper in inception….
Deeper and deeper…..? How deep?
But can we just keep going deeper? Unfortunately not.
Then we face vanishing gradient problem (explained in series before) and we are stuck. Look at the error graph…
So if we are stuck we vanishing gradient what can we do?
We take input from n steps before and feed it forward….
say we have steps 1->2->3->4->5->6->7->8
example: we feed output of 3 to 6.
It is difficult as we need to make sure the size of the output is the same as the input to the layer when we skip and add back again after a few steps.
Several other combinations of modules can be connected in different ways to get the best learning and accuracy :)
Let’s jump into the notebook to build our MNIST model from the previous post again but this time with advanced CNN instead of basic CNN.
About the Author
I am venali sonone, a data scientist by profession and also a management graduate.
This series is inspired by failures.
If you want to have a talk about short 5 years or 50 years, the latter indeed require something challenging enough to keep the spark in your eyes.