Blog: Classifying MNIST using a Convolution Neural Network with MXNet

Introduction:
In this article, I’m going to show you how to classify MNIST dataset, a collection of images of handwritten digits(0–9), Often considered the “Hello, World” of deep learning.If you have no prior experience in deep learning or know nothing about Convolutional Neural networks,this would be a perfect place to start.
I am going to implement a CNN with Apache MXNet.This article will help you to understand the basic structure and implementation of Neural Network’s in MXNet.
MXNet — Brief Introduction


Apache MXNet is an open-source deep learning software framework, used to train, and deploy deep neural networks. It is scalable, allowing for fast model training, and supports a flexible programming model and multiple programming languages (including C++, Python, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram Language.)
The MXNet library is portable and can scale to multiple GPUs[1] and multiple machines. MXNet is supported by public cloud providers including Amazon Web Services (AWS)[2] and Microsoft Azure.[3] Amazon has chosen MXNet as its deep learning framework of choice at AWS.[4][5] Currently, MXNet is supported by Intel, Dato, Baidu, Microsoft, Wolfram Research, and research institutions such as Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science and Technology.[6]
Source: Wiki
Let’s get started…
code used in this article can be found in my github repository.
Installation:
Mxnet can be installed using pip:
pip install mxnet
If you have a GPU,you can install GPU version of mxnet:
pip install mxnet-cuXX
where XX is your CUDA version.For example,I have CUDA 10 installed on my PC so i run:
pip install mxnet-cu100
- You would need to following packages to follow along with me:
pip install numpy scikit-learn pandas seaborn matplotlib
The dataset used here can be downloaded from Kaggle:
Let’s start by importing required packages:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
Load our train dataset:
#load our train dataset.
train = pd.read_csv("~/datasets/mnist/train.csv")
Split our train into X(image data) and Y(label), reshaping it into 28×28
#Splitting train dataset into X and Y.Normalizing it by dividing it with 255
X = train.iloc[:,1:].values.reshape(-1,28,28) / 255
Y = train.iloc[:,0].values
Split our data into train and validation set:
from sklearn.model_selection import train_test_split
trn_x,val_x,trn_y,val_y = train_test_split(X,Y,test_size=0.2)
The Network:
This is how we define a CNN in MXNet
Net(
(cnn1): Conv2D(None -> 20, kernel_size=(4, 4), stride=(1, 1))
(mxp1): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
(cnn2): Conv2D(None -> 25, kernel_size=(2, 2), stride=(1, 1))
(mxp2): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
(cnn3): Conv2D(None -> 30, kernel_size=(2, 2), stride=(1, 1))
(mxp3): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
(flat): Flatten
(fc): Dense(None -> 128, linear)
(out): Dense(None -> 10, linear)
)
below block of code will select CPU or GPU(if available,preferred) and initialize our network.
device = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu(0)
cnn.initialize(mx.init.Xavier(), ctx=device)
We define ‘trainer’,which manages optimization of our network
trainer = gluon.Trainer(
params=cnn.collect_params(),
optimizer='adam',
optimizer_params={'learning_rate': 0.01},
)
we define few important variables that we will use during training to calculate loss,accuracy and we also define loss for our model.
accuracy_fn = mx.metric.Accuracy()
loss_function = gluon.loss.SoftmaxCrossEntropyLoss()
ce_loss = mx.metric.CrossEntropy()
Convert numpy array into mxnet ndarray.
#converting our numpy array into mxnet.nd.array
trn_x = nd.array(trn_x).reshape(-1,1,28,28)
trn_y = nd.array(trn_y)
val_x = nd.array(val_x).reshape(-1,1,28,28)
val_y = nd.array(val_y)
Now,train our model:
define a small to function to reset values after we have calculated metrics:
def reset_metrics():
accuracy_fn.reset()
ce_loss.reset()
For every epoch,we print metrics to see how good our model is performing.you should see something like this as output:
epoch: 1 | trn_loss: 1.091759033203125 | trn_acc: 0.654 | val_loss: 1.0010985107421875
epoch: 2 | trn_loss: 0.434418212890625 | trn_acc: 0.871 | val_loss: 0.44930450439453123
epoch: 3 | trn_loss: 0.27909747314453126 | trn_acc: 0.911 | val_loss: 0.24485662841796876
epoch: 4 | trn_loss: 0.20936044311523438 | trn_acc: 0.931 | val_loss: 0.20494595336914062
epoch: 5 | trn_loss: 0.1736647491455078 | trn_acc: 0.937 | val_loss: 0.16943218994140624
epoch: 6 | trn_loss: 0.15489833068847655 | trn_acc: 0.949 | val_loss: 0.14828227233886718
epoch: 7 | trn_loss: 0.14918743896484374 | trn_acc: 0.952 | val_loss: 0.1260661087036133
epoch: 8 | trn_loss: 0.12303134155273437 | trn_acc: 0.961 | val_loss: 0.11447244262695312
epoch: 9 | trn_loss: 0.07636952972412109 | trn_acc: 0.975 | val_loss: 0.07003825378417969
epoch: 10 | trn_loss: 0.08552651977539062 | trn_acc: 0.971 | val_loss: 0.07812557983398437
..............
..............
..............
epoch: 30 | trn_loss: 0.020007244110107424 | trn_acc: 0.994 | val_loss: 0.02010601806640625
You can calculate validation accuracy by running the following lines:
pred = cnn(val_x.as_in_context(device))
predictions = []
for p in pred.asnumpy():
predictions.append(np.argmax(p,axis=0))
from sklearn.metrics import accuracy_score
acc = accuracy_score(val_y.asnumpy(),predictions)
print("Accuracy:",acc*100,"%")
Output:
Accuracy: 98.1785714286 %
Conclusion:
You just learned how you can define and train a neural network with MXNet and classify MNIST with accuracy of over 98%
Leave a Reply