Gradient Descent has been one of the popular optimization algorithms present in current times. Momentum is its greatest friend. Momentum provided a new perspective to research in optimization. From months, I was trying to understand backpropogation and when I learned, it felt totally overwhelming.

**The teacher of computers — Gradient Descent**

*Machine Learning gives the ability to learn to machines so that they can improve their themselves properly.*medium.com

This goes as a tribute to the Mother of Artificial Intelligence which is Gradient Descent.

Here’s a more mathematical explanation of Gradient Descent and its variants → https://colab.research.google.com/drive/1lNhdf4TwPvQrN3CKyGhPmtxC9uOGGKZW#scrollTo=N9jb8SnWyDx1&forceEdit=true&offline=true&sandboxMode=true

### The Birth of the Mother

The first question which comes in most of our minds is that:

Why do we need optimization algorithms? Do they really have uses in economics and mathematics?

Here’s where you will see Mathematical Optimization. Wikipedia says,

In the simplest case, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function.

In simpler words, a mathematical function is optimized in order to get the minimum or maximum results out of a function. Suppose, if we have a function *f( x ) = x²*

Now, the minima of this function or a particular point on the line which has the minimum value are *(0, 0 ).* If *x = 0* and then *y = 0*. You can’t get a value that’s smaller than 0.

#### Gradient Descent

Initially, the Random optimization algorithm was used for optimization. It consisted of random picking up a set of values, plugging them in the function, and fetch the results. The smallest result was considered as the minima.

Gradient Descent proposed a newer and efficient way to reach the minima or maxima ( opposite of minima ). It devised the use of *gradient* or *slope *of a function which needs to be optimized.

### What is the gradient of a function? ( No calculus guaranteed! )

In simpler words, the gradient is the slope of the tangent line to a curve at a specific point. For example, we take our *f( x ) = x²* function. The gradient could be calculated producing a *derivative* of the function

For a multivariable function, the gradient is a vector of all the partial derivatives of a function.

∇ symbol stands for the gradient.

### What’s in Artificial Intelligence?

In machine learning or AI, we try to explain every possible relationship in the form of the function which has some parameters and provides some sort of result. For example, we give the AI an image of a cat and train it to label the image as a cat. We can transform this classification task into some function like:

But, how do we find this function? For the function which we used as an example *f( x ) = x², *we knew that if *y* in the output and *x* is the input then the relationship between them is *y = x².*

How does a computer find such as a relationship with the image and its label *“cat”.*

#### Universal Function Approximation Theorem

This nice theorem is pretty useful for Artificial Neural Networks. I have written a story on it.

**Artificial Neural Networks. Universal Function Approximators?**

*Artificial Neural Networks are the most intricate and delicate deep learning algorithms which exist in the AI world…*medium.com

### Enter Gradient Descent Optimization

Hence, step by step we are approaching the minima and our NN learns more efficiently.

### That’s all!

Lot’s of Math right? That’s Gradient Descent, the teacher, mother, of intelligence. Thank You and happy AI learning.