Blog: Creating your own deep learning network using Tensorflow 2.0 — for absolute begineers
Do you really want to understand creating your own deep learning network? Have you tried and didn’t succeed yet? Let me take you to the journey of creating your own deep learning network using Tensorflow 2.0.
I was really excited about Tensorflow 2.0 when it was announced by Google. Tensorflow is the core open source library by Google to help you develop and train ML models. In Tensorflow 2.0, Google has really focused on some of the much needed feaures like eager execution, intuitive higher-level APIs, and flexible model building on any platform. However, we shall not go that deep and will keep it simple for absolute begineers. This tutorial will make use of
tf.keras, TensorFlow’s high-level Python API for building and training deep learning models for image classification.
Setting up the deep learning environment
I used ubuntu 18.04 linux for creating the deep learning models. I also took advantage of nvidia’s GEFORCE® GTX 1080 Ti GPU which made the task of generating a new deep learning model effortless.
On Linux, Docker is the easiest way to enable TensorFlow GPU support since only the NVIDIA® GPU driver is required on the host machine (the NVIDIA® CUDA® Toolkit does not need to be installed). However, if you want to setup the deep learning environment yourself, here are the step-by-step instructions:
# Download the Anaconda installer for Linux and run the following to # install Anaconda for Python 3.7:
# Continue to follow the instructions of Anaconda installer. Once # finshed it will activate the conda's base environment on startup # of linux terminal. To deactivate the conda's base environment so # that it is not activated on startup,use the following command:
conda config --set auto_activate_base false
# Add NVIDIA package repositories
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-410
# Reboot. Check that GPUs are visible using the command: nvidia-smi
# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-10-0 libcudnn7=188.8.131.52-1+cuda10.0 \
# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get update && \
sudo apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 && sudo apt-get update \
&& sudo apt-get install -y --no-install-recommends libnvinfer-dev=5.0.2-1+cuda10.1
# Install TensorFlow 2.0 Alpha with GPU support
pip install tensorflow-gpu==2.0.0-alpha0
# Install Keras
pip install keras
Developing the deep learning model
Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans i.e. learn by example and progressively learn to accurately solve a given problem. Deep learning is a key technology behind vehicle, pedestrian, and landmark identification in self-driving vehicles, speech recognition, NLP, and medical imaging etc.
Large data sets are needed to make sure that the deep learning model delivers the desired results. As human brain needs a lot of experiences to learn and deduce information, the analogous artificial neural network requires substantial amount of data. More the abstaction needed, the more parameters will need to be tuned and more parameters require more data. Hence, developing the most accurate deep model is an ongoing activity and require a lot of computing resources.
Here are the steps involved in developing the deep learning model:
Acquiring the data
Data collection is the first and foremost requirement for creating a deep learning model. It is the process of gathering and measuring information from different sources.
For a beginner to collect such a huge data for images is next to impossible. There are variety of organizations which provides such data for free. One of such image dataset (Quick, Draw! — the dataset of 50 million images in 345 categories) is provided by Google. These doodles are a unique data set that can help developers train new neural networks. The public dataset of quickdraw is available here. To create the sample model, I would recommend to download atleast 10 of those and store in a dataset folder. Here is how it will look like:
import tensorflow as tf
import numpy as np
from sklearn.model_selection import train_test_split
from os import walk
imgFiles = 
data_path = "./dataset/"
# filenames accumulate in list 'imgFiles'
for (dirpath, dirnames, imgFiles) in walk(data_path):
# Output: ['The Great Wall of China.npy', 'airplane.npy', 'ant.npy', # 'The Eiffel Tower.npy', 'bicycle.npy', 'cat.npy', 'bathtub.npy', # 'camel.npy', 'backpack.npy', 'baseball.npy']
Preprocessing the data
The data must be preprocessed before training the network. The first step in preprocessing is to break the data into the training and testing segments. We have used 7,20,000 images to train the network and 2,00,000 images to test how accurately the network learned to classify images.
# Creating the image category based on image dataset
num_images = 500000
num_img_files = len(imgFiles)
images_per_category = num_images//num_img_files
for file in imgFiles:
file_path = data_path + file
x = np.load(file_path)
# normalize images
x = x.astype('float32')
x /= 255.0
# get the sample of images and labels
y = [i] * len(x)
x = x[:images_per_category]
y = y[:images_per_category]
if i == 0:
x_all = x
y_all = y
x_all = np.concatenate((x,x_all), axis=0)
y_all = np.concatenate((y,y_all), axis=0)
i += 1
# split data into train and test segments
x_train, x_test, y_train, y_test = train_test_split(x_all, y_all, test_size=0.2, random_state=42)
The next task is to reshape x_train and x_test into the smaller images.
# defining image dimensions
img_rows, img_cols = 28, 28
x_train = x_train.reshape(x_train.shape, img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape, img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
Loading the dataset returns four NumPy arrays:
x_trainarray is the training set—the data the model uses to learn.
- The model will be tested against the test set, the x_
The images are 28×28 NumPy arrays, with pixel values ranging from 0 to 255. The labels are an array of integers, ranging from 0 to 9.
Now we have converted our data into categorical data. FYI, categorical data are the variables that contain label values rather than numeric values. We shall be using Convolutional Neural Networks (CNN) to create our model. CNN algorithms cannot operate on label data directly. They require all input variables and output variables to be numeric. This means that categorical data must be converted to a numerical form.
We shall use one-hot encoding to convert the categorical data to numerical data.
# one-hot encoding the y_train and y_test labels
y_train = tf.keras.utils.to_categorical(y_train, num_img_files)
y_test = tf.keras.utils.to_categorical(y_test, num_img_files)
# split the training and test sets into further smaller test sets
x_train, x_valid, y_train, y_valid = train_test_split(x_train, y_train, test_size=0.1, random_state=42)
Creating the model
As stated earlier, we shall be creating a Convolutional Neural Network (CNN). The basic building block of a neural network is the layer. Layers extract representations from the data fed into them.
Most of the deep learning network consists of chaining together simple layers. Most layers, such as
tf.keras.layers.Sequential, have parameters that are learned during training.
# Creating the model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
There are two convolutional layers (with a ReLU activation), each of which is interposed with maxpooling and dropout layers.
followed by a layer to flatten the output of the convolutional layers into one dimension. These layers are followed by a dense (fully connected) one-dimensional layer (again with a ReLU activation), a final dropout layer, and lastly a softmax layer with 10 units. The activation of each output unit in the softmax layer gives the probability that the image is one of the image categories.
Before the model is ready for training, a few more settings needs to be added during the model’s compile step:
- Loss function — This measures how accurate the model is during training. There are a lot of available loss functions which can be used to minimize the loss.
- Optimizer — This is how the model is updated based on the data it sees and its loss function. Again, there are lot of optimizer functions available which can be used for optimization.
- Metrics — Used to monitor the training and testing steps.
Training the model
Training the model requires feeding the training data to the model. This is done through the
model.fit method which “fits” the model to the training data.
batch_size = 128
tbCallBack=tf.keras.callbacks.TensorBoard(log_dir = "./logs")
model.fit( x_train, y_train,
This is how the output looks like:
Train on 720000 samples, validate on 80000 samples
720000/720000 [==============================] - 22s 31us/sample - loss: 2.2367 - accuracy: 0.1790 - val_loss: 2.1397 - val_accuracy: 0.4587
720000/720000 [==============================] - 21s 29us/sample - loss: 2.0212 - accuracy: 0.3232 - val_loss: 1.8199 - val_accuracy: 0.5348
720000/720000 [==============================] - 21s 29us/sample - loss: 1.7841 - accuracy: 0.3905 - val_loss: 1.5667 - val_accuracy: 0.5567
720000/720000 [==============================] - 21s 29us/sample - loss: 0.9274 - accuracy: 0.6978 - val_loss: 0.7598 - val_accuracy: 0.7632
720000/720000 [==============================] - 21s 29us/sample - loss: 0.9115 - accuracy: 0.7040 - val_loss: 0.7465 - val_accuracy: 0.7672
720000/720000 [==============================] - 21s 29us/sample - loss: 0.8976 - accuracy: 0.7082 - val_loss: 0.7346 - val_accuracy: 0.7699
Depending on the hardware configuration, this model will train quite quickly if it’s running on a GPU, or slowly on a CPU. On nvidia’s GEFORCE® GTX 1080 Ti GPU, each epoch took around 22 sec to run. However, on nvidia’s GEFORCE® GTX 1060 Ti GPU, it took around 160 sec to run.
We can test the developed model in following manner:
score = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', score)
print('Test accuracy:', score)
Congratulations! You made it till the end. The process of creating these models is almost similar for different use cases. However, you will need to learn different deep learning algorithms and techniques to fine tune the parameters to have the minimum loss and maximum accuracy. I do hope you enjoyed devcloping your own model from scrach for deep learning.
Stay tuned for more and follow me here.
Thanks for reading!