## Gradient Descent :

Gradient descent is an optimization algorithm used to find the values of parameters that minimize a given cost function. It is commonly used in machine learning to find the optimal weights for a model.

To understand gradient descent, let’s consider a simple example of a cost function that depends on two variables, x and y. The cost function, in this case, could be the sum of the squares of the differences between the predicted values and the actual values. The goal of gradient descent is to find the values of x and y that minimize this cost function.

To do this, we start with an initial guess for the values of x and y, and then compute the partial derivatives of the cost function with respect to x and y. These partial derivatives give us the direction in which the cost function is increasing the most. We then move in the opposite direction by a small step, called the learning rate, to reduce the cost function. This process is repeated until we reach a minimum value for the cost function.

Let’s consider another example of gradient descent in the context of linear regression. In linear regression, we try to fit a line to a given set of data points such that the sum of the squares of the differences between the predicted values and the actual values is minimized. This can be represented by the following cost function:

J(w) = ∑ (y – ŷ)^2

Here, w is the vector of weights that we need to find. To use gradient descent to find the optimal values of w, we start with an initial guess for w and then compute the partial derivatives of the cost function with respect to each weight in w. We then move in the opposite direction of the partial derivatives by a small step, called the learning rate, to reduce the cost function. This process is repeated until we reach a minimum value for the cost function.

In summary, gradient descent is an optimization algorithm that is commonly used in machine learning to find the values of parameters that minimize a given cost function. It does this by starting with an initial guess for the values of the parameters and then iteratively moving in the opposite direction of the partial derivatives of the cost function to reduce the cost function.