Calculate model predictions
ID | f1 | f2 | target | prediction | error | delta w0 | delta w1 | delta w2 |
---|---|---|---|---|---|---|---|---|
1 | 501 | 4 | 320 | |||||
2 | 551 | 7 | 380 |
Loss function is sum squared error:
measure of the model's fit, aiding in optimization, and facilitating model comparison
Derivative of loss function:
- is the model output, so makes sense that we need to compute the model output
- is the model error, so also makes sense that we calculate this
Let's assume we have the following weights:
Clearly our model is: –>
1. Calculate model predictions
So what we do is apply to each instance
ID | f1 | f2 | target | prediction | error | delta w0 | delta w1 | delta w2 |
---|---|---|---|---|---|---|---|---|
1 | 501 | 4 | 320 | 92.363 | ||||
2 | 551 | 7 | 380 | 101.481 |
2. Calculate the error
Here we calculate the error between the correct values and the model prediction
ID | f1 | f2 | target | prediction | error | delta w0 | delta w1 | delta w2 |
---|---|---|---|---|---|---|---|---|
1 | 501 | 4 | 320 | 92.363 | 227.637 | |||
2 | 551 | 7 | 380 | 101.481 | 278.519 |
3. Calculate values for 1st variable
i.e. for on each instance
But we know that so the equation is simply the error. You can also think of it as an elementwise multiplication of the error and a column of ones .
d0 | error |
---|---|
1 | 227.637 |
1 | 278.519 |
Which results in:
ID | f1 | f2 | target | prediction | error | delta w0 | delta w1 | delta w2 |
---|---|---|---|---|---|---|---|---|
1 | 501 | 4 | 320 | 92.363 | 227.637 | 227.637 | ||
2 | 551 | 7 | 380 | 101.481 | 278.519 | 278.519 |
4. Calculate values for 2nd variable
i.e. for on each instance
So we perform an element wise multiplication of the error with the feature corresponding to :
f1 | error |
---|---|
501 | 227.637 |
551 | 278.519 |
Which results in:
ID | f1 | f2 | target | prediction | error | delta w0 | delta w1 | delta w2 |
---|---|---|---|---|---|---|---|---|
1 | 501 | 4 | 320 | 92.363 | 227.637 | 227.637 | 114046.137 | |
2 | 551 | 7 | 380 | 101.481 | 278.519 | 278.519 | 153463.969 |
5. Calculate values for 3rd variable
i.e. for on each instance
So we perform an element wise multiplication of the error with the feature corresponding to :
f2 | error |
---|---|
4 | 227.637 |
7 | 278.519 |
Which results in:
ID | f1 | f2 | target | prediction | error | delta w0 | delta w1 | delta w2 |
---|---|---|---|---|---|---|---|---|
1 | 501 | 4 | 320 | 92.363 | 227.637 | 227.637 | 114046.137 | 910.548 |
2 | 551 | 7 | 380 | 101.481 | 278.519 | 278.519 | 153463.969 | 1949.633 |
6. Calculate actual values
- Now calculate the actual delta values used to perform the weight updates.
- This is the sum of the values for , , and
ID | f1 | f2 | target | prediction | error | delta w0 | delta w1 | delta w2 |
---|---|---|---|---|---|---|---|---|
1 | 501 | 4 | 320 | 92.363 | 227.637 | 227.637 | 114046.137 | 910.548 |
2 | 551 | 7 | 380 | 101.481 | 278.519 | 278.519 | 153463.969 | 1949.633 |
506.156 | 267510.106 | 2860.181 |
7. Optional: compute sum of squared errors
- Compute the sum of squared errors by squaring each of the error values in the error column, and computing the sum.
- This would provide you with a single metric to determine how good the current model is with respect to its current weights.
- Remember that we are trying to minimise the loss function.
- For , the loss would typically be bad because we randomly initialised the weights.
8. Weight updates
- Now that we have the values we can perform the weight updates
new weight
Assume the learning rate is
We already know the current weights for time step 0, and we know all the values.
Substituting the values:
Substituting the values:
Substituting the values:
- Now we could repeat all the steps again.
- If you compute the sum of squared errors you will see that this value would have reduced (I.e. we minimised the loss).
- We are trying to optimise the weights in such a was as to minimise the loss.
- repeat all the steps and compute the weights for , and also ensure that the loss function has indeed decreased in time step when compared to time step .
ID | f1 | f2 | target | prediction | error | delta w0 | delta w1 | delta w2 |
---|---|---|---|---|---|---|---|---|
1 | 501 | 4 | 320 | Compute again | … | … | … | … |
2 | 551 | 7 | 380 | Compute again | … | … | … | … |
… | .. | … |