The steps involved in the Neural Network are :

We take the input equation: Z = W0 + W1X1 + W2X2 + …+ WnXn and calculate the output, which is the predicted values of Y or the Ypred

Calculate the error. It tells how much the model deviates from the actual observed values. It is always calculated as the Ypred – Yactual

In short we need to find the cost function which means the residual. The residual is the difference between the predicted outcome and the actual outcome.

The main target is to achieve the minimum error as we learned in machine learning.

Taking the computed loss value back to each layer updates the weights in a way that minimizes the loss. A loss function must have two qualities: it must be continuous and differentiable at each point.

When we started to compute the weights to train the network we came across too many permutations and combinations which took too much time. So to handle this problem gradient descent is considered to be the best amongst all since it starts computing based on the magnitude and direction of the flow of the network.

To this process we do a process that is called numerical gradient expressions which helps you to find out which way is downhill. Downhill is the term referred to the lowest point in the network finding which basically is our task since we want to minimize the cost functions.

The best way to solve this complex problem is by adapting the Gradient Descent algorithm.