Blog: Python & Vectorization
Computer hardware in today’s world leverages parallel computing for faster computation by making use of SIMD (Single Instruction, Multiple Data) architectures. SIMD is a class of parallel computing in which the logical processors perform a single instruction on multiple data points simultaneously. We need to vectorize our deep learning code so that we can harness all the computing power that our system provides. The faster the computation, the faster our neural network is trained and the faster we get our results. Therefore, the ability to vectorize a piece of code has become a crucial skill for a deep learning practitioner. In this story, I will be explaining the basics of vectorization using python.
What exactly is Vectorization?
In the context of logistic regression, let us try to understand what vectorization exactly means with the help of the equation below:
The convention for storing the inputs and weights is not standard but I prefer to store it the following way. Let X be the input matrix of dimensions (n,m) where n is the number of features present in X and m is the number of training examples present in our training dataset i.e. we are storing one data point in 1 column. For each feature, there will be a weight associated with it, therefore let W be the weight matrix of dimensions (n,1).
Now, if we were to code the example equation using for loops, we would come up with something like:
The above snippet of code consists of explicit for loops and will not make use of parallelization. Therefore, we need to convert it into a vectorized version. This can be done easily by leveraging the built-in Numpy functions in the following way:
Z will be a (1,m) matrix as per matrix multiplication rules. The np.dot() function performs matrix multiplication of the given input matrices. It not only makes the code more readable and understandable, but it also makes use of parallelization for faster computation. The code snippet below shows how fast the vectorized implementation works as compared to non vectorized one.
import numpy as np
# Number of features
n = 1000
# Number of training examples
m = 10000
# Initialize X and W
X = np.random.rand(n,m)
W = np.random.rand(n,1)
# Vectorized code
Z = np.dot(W.T,X)
print("Time taken for vectorized code is : ",(time.time()-t1)*1000,"ms")
# Non Vectorized code
Z1 = np.zeros((1,m))
t2 = time.time()
for i in range(X.shape):
for j in range(X.shape):
Z[i] += W[j]*X[j][i]
print("Time taken for non vectorized code is : ",(time.time()-t2)*1000,"ms")
Time taken for vectorized code is : 5.964040756225586 ms
Time taken for non vectorized code is : 40915.54665565491 ms
The above implementation considers only 10k training examples and 1k features. Although there are code optimization strategies, clearly the vectorized implementation is much faster than the non vectorized one. Numpy is a python library that is used for scientific computation. It offers various inbuilt functions that make it easy for us to write a vectorized code.
As a rule of thumb, we should write a vectorized code for any future implementations using built-in numpy functions.
Vectorizing the Logistic Regression
Now that we have seen how beneficial it is to write a vectorized code, let us delve deeper and write a vectorized code for logistic regression. It is not possible to write a vectorized code for each and every case but we should try to follow the rule of thumb wherever possible. Let us see a non vectorized version of logistic regression and try to figure out the parts which can be vectorized. This way we will understand how to convert a given piece of code into its vectorized version. For simplicity, we will consider only 2 features in X and therefore only 2 weights.
In the above example, we have considered only 2 weights i.e. w1 and w2 but in real life scenarios there will lot more weights, handling them will become a complex task. Therefore, we will vectorize the calculation and updation of weight derivates dw1 and dw2 by:
dW = X[i].dZ[i]
dW /= m
With the help of the above changes, we have managed to vectorize only a small part of the code. Most of part still depends on the for loop which is used to iterate over all the training examples. Let us see how we can remove that for loop and vectorize that as well:
The value A for all training examples can be easily found out by:
A = sigmoid(np.dot(W.T,X)+b)
The cost J can be found out by:
J = -(np.dot(Y,np.log(A).T)+np.dot((1-Y),np.log(1-A).T))
The derivatives dZ, dW and dB for all training examples can be found out by:
dZ = A - Y
dW = np.dot(X*(dZ.T))/m
dB = (np.sum(dZ))/m
The Weight Matrix W and bias B can be updated by:
W = W - alpha*dW
b = b - alpha*dB
These conversions may seem perplexing at first. Therefore, I urge the readers to see how the dimension of each matrix changes after each calculation. This will help in understanding things in a better way. Let us apply the changes and see how the vectorized code looks like when everything is compiled together.
The above code is much cleaner, readable, short and computationally faster.
Amidst the code, you may have found that two matrices of incompatible dimensions are added, subtracted, multiplied and divided. Numpy has a great feature called Broadcasting. Under some constraints, the smaller matrix is broadcasted to the bigger matrix so that they have compatible dimensions to carry out various mathematical operations. Let us see how broadcasting works with the help of some examples. Let A and B be input matrices and C be the output matrix as a result of any mathematical operation on A and B.
Shape of A : 5 x 4
Shape of B : 4
Shape of C : 5 x 4
Shape of A : 15 x 3 x 5
Shape of B : 15 x 1 x 5
Shape of C : 15 x 3 x 5
Shape of A : 8 x 1 x 6 x 1
Shape of B : 7 x 1 x 5
Shape of C : 8 x 7 x 6 x 5
Shape of A : 2 x 3 x 3
Shape of B : 1 x 5
Shape of C : Error
It follows that broadcasting works on 2 principles:
- The trailing dimensions of A and B should be equal or
- Trailing dimensions of either A or B should be 1.
The best way to understand Vectorization and convert a given code in vectorized format is to keep a track of the dimensions of various matrices that are at the table.
Writing a Vectorized version of the code can be daunting at first but, together with the help of Numpy’s built-in function and broadcasting and with practice, it becomes really easy. It will make the code more readable and immensely fast.
I would like to thank the readers for reading the story. If you have any questions or doubts, feel free to ask them in the comments section below. I’ll be more than happy to answer them and help you out. If you like the story, please follow me to get regular updates when I publish a new story. I welcome any suggestions that will improve my stories.