Blog: Quick video-summary on Image Classification at Supercomputer Scale
On December 2018 a small team of Google researchers (Chris Ying, Sameer Kumar, Dehao Chen, TaoWang, Youlong Cheng) published their results on an ensemble of techniques and clever proposals aiming at optimizing and streamlining the processing of massive volumes of images faster and without sacrificing accuracy.
They successfully obtained 4X performance improvement when training for the Large-Scale ResNet-50 model, achieving a whooping throughput of 1.05 million images/second or a total of 2.2 seconds. Yep, the whole training on a massive cluster of 1,024 TPU v3 chips. That broke the 8.7 record obtained by Jia et al earlier that same year.
I’ve prepared this 4-min video as a quick walk-through over this remarkable work. Worth to mention this techniques are applicable individually, on CPUs|GPUs|TPUs and any DL architecture that is susceptible of parallellization. Please kindly like/share/comment the video within YouTube: https://youtu.be/JvssZESVcjI
Hope this helps you or anyone you work with, feel free to share.