## Blog: SVM and Neural Networks

This article belongs to a series of articles, so if you have not read the previous ones here is the first one: https://medium.com/@agus.bignu97/quantum-machine-learning-a7d8d135bc58

In the previous article we introduced the types of learning ML and AI have. In this article we are going to focus on two types of algorithms of supervised learning. Both of them will be useful in future articles.

The algorithms that we are going to discuss are: Support Vector Machine (SVM)and Neural Networks (NN).

#### Support Vector Machine

It is a classification algorithm that belongs to the branch of supervised learning [1]. The classification is done by finding the best hyperplane in an N-dimensional space (N is the number of parameters) that separates the data.

To separate two kinds of data there are many possible hyperplanes. The goal of the algorithm is to find the best distance between the data of both classes (figure 1). The hyperplanes delimitate the area where a class begins and another ends. When making predictions, if we receive a parameter that is in the lower area of the hyperplane, it will be classified as the red class. If is the other zone it will be blue. The hyperplanes can be of different dimensions, it depends on the data we have.

That is, if we have a dataset {𝑥1, 𝑥2}, as in Figure 1, our hyperplane will be a line, if the data depends on three different attributes: {𝑥1, 𝑥2, 𝑥3} the hyperplane will be of dimension two. In other words, it will always have one less dimension than our dataset.

The support vectors are data that are close to the hyperplane and influence its orientation and positioning.

In SVM we seek to maximize the margin between the points and the hyperplane. The function that will help us maximize the margin is the following:

The expression (1) is called the loss function. In this expression, x is the input data, y is the known label and f(x) is the prediction we make. If these last two are of the same sign, the loss function is zero. We also add a parameter (λ) that helps regulate the loss function. The goal of this parameter is to balance the optimization and the function itself. After adding the regularization, the expression (1) transforms as follow:

In the expression (2) w are the weights. Once the general loss function is obtained, we update the weights with the gradient descent method. The updating of the weights is done in the following way: if the classification is correct, we only update the gradient including the regularization parameter, otherwise we update the gradient with the loss function and the regularization parameter.

In the expression (3) α is the learning rate, it is usually a small value.

This algorithm is used to perform classification problems and to make predictions. For example, to distinguish between three types of flowers from three different parameters: 𝑥 = {color, width of the petals, height}. Then, by learning from the data, the model must be able to predict what kind of flower (y) belongs a flower with data that has not seen by the algorithm.

#### Neural networks

This type of algorithm also belongs to the supervised learning sector [3]. It consists of a succession of layers of neurons connected between them. We have an input layer, a succession of internal layers or hidden layers and an output layer:

We will start by analyzing how a single neuron works. The neuron acts as a function that returns a value between 0 and 1. A very used function is the following:

It is a non-linear function (sigmoid). In the previous expression, x are the input data of the neuron and w are the weights. Each neuron of the neural network will update its weights, as in the SVM algorithm, using the gradient descent method and relying on a type of learning called backpropagation. We perform the error between the prediction made by the network and the real value. In this way, by making successive derivatives (chain rule) based on the error made and this tells the weights how much they should update their value. This means that, if the error is small, the weights will change little or nothing. Here the function to be optimized is the one which measures the error, in other words, we seek to minimize the error of the prediction. One type of error function that is widely used is the mean square error:

Where *y‘* is the prediction and *y* is the real value.

Let’s now focus on a special type of neural network called Boltzmann Machine (BM). Especially the so-called restricted Boltzmann machine (RBM) [5]. This type of network has the following form:

It consists of a visible layer and a hidden one. This neural network tries to minimize the following function:

Where n and m are the number of nodes (neurons) of the visible and hidden layer, respectively. This function depends on the configurations of visible (v), hidden (h), weights (w) and biases (b, c) states.

Boltzmann machines are probabilistic. At each moment of time, the RBM is in a state that refers to the value of the neurons in the visible layer and the hidden one (*v, h*). The probability that this state can be observed is given by

In the previous expression, Z is the partition function. The expression (7) is the Boltzmann distribution. The conditional probabilities between (*v, h*) are

It should be noted that the neurons can be in 1 or 0. This means that it is activated or not activated. In the expressions (8) sigm refers to (4). It is a type of neuron different from the one explained before (perceptron), it is called stochastic.

This gives us a general idea of this type of algorithms, an example of how these types of networks applied to a quantum annealing will be explained in another article.

A final case of interest within Boltzmann machines is the deep Boltzmann machine (DBM) [7], its structure can be seen in figure 4.

It is a type of network that, like the RBMs, has a visible layer (in figure 4 it is called the state layer) and instead of a single hidden layer it can have multiple layers, in this way it extends the expression (6) to the following:

Where k is the number of hidden layers. In Figure 4, the action layer represents another visible layer. It is explicitly stated since this network will be used in a future article for reinforcement learning tasks.

#### Conclusions

To sum up, we have discussed two types of algorithms that belong to supervised learning. These algorithms will be useful in future articles.

If you want to know more about these algorithms do not hesitate in contacting me or you can go to the references.

In the next article we are going to be discussing Bayesian Networks and reinforcement learning algorithms applied to BM.

Keep it up!

#### References

[1]Support Vector Machine, https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47

[2]Support Vector Machine, https://towardsdatascience.com/ support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47 Fig. 1

[3]David J.C. MacKay, *Information Theory, Inference, and Learning Algorithms*. Cambridge University Press, 2003.

[4]Artificial Neural Network, https://en.wikipedia.org/wiki/Artificial_neural_network#/media/File:Colored_neural_network.svg

[5]Restricted Boltzmann Machines, https://towardsdatascience.com/deep-learning-meets-physics-restricted-boltzmann-machines-part-i-6df5c4918c15

[6]Steven H. Adachi, Maxwell P. Henderson. *Application of Quantum Annealing to Training of Deep Neural Networks*. Lockheed Martin Corporation, 2015. Fig. 1

[7]Daniel Crawford, Anna Levit, Navid Ghadermarzy, Jaspreet S. Oberoi and Pooya Ronagh. *Reinforcement Learning Using Quantum Boltzmann Machines*. arXiv:1612.05695v2, 2016.

[8]Daniel Crawford, Anna Levit, Navid Ghadermarzy, Jaspreet S. Oberoi and Pooya Ronagh. *Reinforcement Learning Using Quantum Boltzmann Machines*. arXiv:1612.05695v2, 2016. Fig. 1b

*Source: Artificial Intelligence on Medium*