- It determines the error ratiio, using some error method, usually the Mean Squared Error (see Regression Performance)
- After determining the error rate, the weights are adjusted in order to find the global minima, or the least difference between predicted & true values via an algorithm called the Gradient Descent Function
Convergence Function
This algorithm describes how to change the cost function, by moving the weight, slightly towards the minima.
Now lambda here, is the Learning curve i.e how fast a weight can adjust in one operation,
Higher Learning curve can lead to less operations, but too much can make it never reach minima.