ProjectBlog: Building a Handwritten Digit Recognizer in Java

Blog: Building a Handwritten Digit Recognizer in Java

Go to the profile of Packt_Pub

In this article, we’ll build a handwritten digit recognizer in a Java application. The application will be built using the open source Java framework, Deeplearning4j. The dataset used is the classic MNIST database of handwritten digits ( The training dataset is oversized, having 60,000 images, while the test data set contains 10,000 images. The images are 28 x 28 in size and grayscale.

As a part of the application that we will be creating in this article, we will implement a graphical user interface, where you can draw digits and get a neural network to recognize the digit.

Jumping straight into the code, let’s observe how to implement a neural network in Java. We begin with the parameters; the first one is the output. Since we have 0 to 9 digits, we have 10 classes:

* Number prediction classes.
* We have 0–9 digits so 10 classes in total.
private static final int OUTPUT = 10;

We have the mini-batch size, which is the number of images we see before updating the weights or the number of images we’ll process in parallel:

* Mini batch gradient descent size or number of matrices processed in parallel.
* For CORE-I7 16 is good for GPU please change to 128 and up
private static final int MINI_BATCH_SIZE = 16;// Number of training epochs
* Number of total traverses through data.
* with 5 epochs we will have 5/@MINI_BATCH_SIZE iterations or weights updates
private static final int EPOCHS = 5;

When we consider this for a CPU, the batch size of 16 is alright, but for GPUs, this needs to change according to the GPU power. One epoch is fulfilled when we traverse all the data.

The learning rate is quite important, because having a very low value will slow down the learning, and having bigger values of the learning rate will cause the neural network to actually diverge:

* The alpha learning rate defining the size of step towards the minimum
private static final double LEARNING_RATE = 0.01;

To understand this in detail, in the latter half of this section, we will simulate a case where we diverge by changing the learning rate. Fortunately, as part of this example, we need not handle the reading, transferring, or normalizing of the pixels from the MNIST dataset. We do not need to concern ourselves with transforming the data to a one-dimensional vector to fit to the neural network. This is because everything is encapsulated and offered by Deeplearning4j.

Under the object dataset iterator, we need to specify the batch size and whether we are going to use it for training or testing, which will help classify whether we need to load 60,000 images from the training dataset, or 10,000 from the testing dataset:

public void train() throws Exception {
Create an iterator using the batch size for one iteration
*/“Load data….”);
DataSetIterator mnistTrain = new MnistDataSetIterator(MINI_BATCH_SIZE, true, SEED);
Construct the neural neural
*/“Build model….”);
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
//NESTEROVS is referring to gradient descent with momentum

Let’s get started with building a neural network. We’ve specified the learning rate, and initialized the weight according to Xavier, which we have learned in the previous sections. The updater in the code is actually just the optimization algorithm for updating the weights with a gradient descent. The NESTEROVS is basically the gradient descent with momentum that we’re already familiar with.

Let’s look into the code to understand the updater better. We look at the two formulas that are actually not different from what we have already explored.

We configure the input layer, hidden layers, and the output. Configuration of the input layer is quite easy; we just need to multiply the width and the weight and we have this one-dimensional vector size. The next step in the code is to define the hidden layers. We have two hidden layers, actually: one with 128 neurons and one with 64 neurons, both having an activation function because of its high efficiency.

Just to switch things up a bit, we could try out different values, especially those defined by the MNIST dataset web page. Despite that, the values chosen here are quite efficient, with less training time and good accuracy.

The output layer, which uses the softmax, because we need ten classes and not 2, we also have the cost function. The details for this may vary from what we have seen previously. This function measures the performance of the hypothetical values against the real values.

We then initialize and define the function, as we want to see the cost function for every 100 iterations. The (minstTrain) is very important, because this actually works iteration by iteration, as defined by many, it traverses all the data. After this, we have executed one epoch and the neural network has learned to use good weights.

Testing the performance of the neural network

To test the accuracy of the network, we construct another dataset for testing. We evaluate this model with what we’ve learned so far and print the statistics. If the accuracy of the network is more than 97%, we stop there and save the model to use for the graphical user interface that we will study later on. Execute the following code:

if (mnistTest == null) {
mnistTest = new MnistDataSetIterator(MINI_BATCH_SIZE, false, SEED);

The cost function is being printed and if you observe it closely, it gradually decreases through the iterations. From time to time, we have a peak in the value of the cost function. This is a characteristic of the mini-batch gradient descent. The final output of the first epoch shows us that the model has 96% accuracy just for one epoch, which is great. This means the neural network is learning fast.

In most cases, it does not work like this and we need to tune our network for a long time before we obtain the output we want. Let’s look at the output of the second epoch:

We obtain an accuracy of more than 97% in just two epochs.

Another aspect that we need to draw our attention to is how a simple model is achieving really great results. This is a part of the reason why deep learning is taking off. It is easy to obtain good results, and it is easy to work with.

As mentioned before, let’s look at a case of disconverging by increasing the learning rate to 0.6:

private static final double LEARNING_RATE = 0.01;
private static final int SEED = 123;
private static final int IMAGE_WIDTH = 28;
private static final int IMAGE_HEIGHT = 28;

If we now run the network, we will observe that the cost function will continue to increase with no signs of decreasing. The accuracy is also affected greatly. The cost function for one epoch almost stays the same, despite having 3,000 iterations. The final accuracy of the model is approximately 10%, which is a clear indication that this is not the best method.

Let’s run the application for various digits and see how it works. Let’s begin with the number 3:

The output is accurate. Run this for any of the numbers that lie between Zero to nine and check whether your model is working accurately.

Also, keep in mind that the model is not perfect yet-we shall improve this with CNN architectures in the next chapter. They offer state-of-the-art techniques and high accuracy, and we should be able to achieve an accuracy of 99%.

Hope you found this article interesting. If you’d like to implement more such deep learning projects in Java, you must check out Hands-On Java Deep Learning for Computer Vision. Written by Klevis Ramo, Hands-On Java Deep Learning for Computer Vision will take you through the process of efficiently training deep neural networks in Java for Computer Vision-related tasks.

Source: Artificial Intelligence on Medium

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top

Display your work in a bold & confident manner. Sometimes it’s easy for your creativity to stand out from the crowd.