Calculate model predictions

IDf1f2targetpredictionerrordelta w0delta w1delta w2
15014320
25517380

Loss function is sum squared error: (12(tMw(d))2)\left(\frac{1}{2}\left(t-\mathbb{M}_{\mathbf{w}}(\mathbf{d})\right)^2\right)

measure of the model's fit, aiding in optimization, and facilitating model comparison

Derivative of loss function: ((tiMw(di))×dj,i)\left(\left(t_i-\mathbb{M}_{\mathbf{w}}\left(\mathbf{d}_i\right)\right) \times d_{j, i}\right)

  • Mw(di)\mathbb{M}_{\mathbf{w}}\left(\mathbf{d}_i\right) is the model output, so makes sense that we need to compute the model output
  • (tiMw(di)\left(t_i-\mathbb{M}_{\mathbf{w}}\left(\mathbf{d}_i\right)\right. is the model error, so also makes sense that we calculate this

Let's assume we have the following weights: W0(t=0)=0.146W_0(t=0)=-0.146 W1(t=0)=0.185W_1(t=0)=0.185 W3(t=0)=0.044W_3(t=0)=-0.044

Clearly our model is: W0+W1d1+W2d2W_{0} + W_{1}\cdot d_1+W_{2}\cdot d_2 –> W(t=0)=0.146+0.185d10.044d2W(t=0)=-0.146+0.185 \cdot d_1-0.044 \cdot d_2

1. Calculate model predictions

So what we do is apply W0+W1d1+W2d2W_{0} + W_{1}\cdot d_1+W_{2}\cdot d_2 to each instance

0.146+0.1855010.0444=92.363-0.146 + 0.185*501 – 0.044 * 4 = 92.363 0.146+0.1855510.0447=101.481-0.146 + 0.185*551 – 0.044 * 7 = 101.481

IDf1f2targetpredictionerrordelta w0delta w1delta w2
1501432092.363
25517380101.481

2. Calculate the error

Here we calculate the error between the correct values and the model prediction yy^y-\hat{y}

IDf1f2targetpredictionerrordelta w0delta w1delta w2
1501432092.363227.637
25517380101.481278.519

3. Calculate δ\delta values for 1st variable

i.e. for (w0,j=0)(w_{0}, j=0) on each instance (i=1,2)(i=1,2) δ=((tiMw(di))×dj,i)\delta=\left(\left(t_i-\mathbb{M}_{\mathbf{w}}\left(\mathbf{d}_i\right)\right) \times d_{j, i}\right)

But we know that d0=1d_{0}=1 so the equation is simply the error. You can also think of it as an elementwise multiplication of the error and a column of ones (d0)(d_{0}).

d0error
1227.637
1278.519

Which results in:

IDf1f2targetpredictionerrordelta w0delta w1delta w2
1501432092.363227.637227.637
25517380101.481278.519278.519

4. Calculate δ\delta values for 2nd variable

i.e. for (w1,j=1)(w_{1}, j=1) on each instance (i=1,2)(i=1,2)

So we perform an element wise multiplication of the error with the feature corresponding to d1d_{1}:

f1error
501227.637
551278.519

Which results in:

IDf1f2targetpredictionerrordelta w0delta w1delta w2
1501432092.363227.637227.637114046.137
25517380101.481278.519278.519153463.969

5. Calculate δ\delta values for 3rd variable

i.e. for (w2,j=2)(w_{2}, j=2) on each instance (i=1,2)(i=1,2)

So we perform an element wise multiplication of the error with the feature corresponding to d2d_{2}:

f2error
4227.637
7278.519

Which results in:

IDf1f2targetpredictionerrordelta w0delta w1delta w2
1501432092.363227.637227.637114046.137910.548
25517380101.481278.519278.519153463.9691949.633

6. Calculate actual δ\delta values

  • Now calculate the actual delta values used to perform the weight updates.
  • This is the sum of the δ\delta values for δ(W0)\delta(W_{0}), δ(W1)\delta(W_{1}), and δ(W2)\delta(W_{2})
IDf1f2targetpredictionerrordelta w0delta w1delta w2
1501432092.363227.637227.637114046.137910.548
25517380101.481278.519278.519153463.9691949.633
506.156267510.1062860.181

7. Optional: compute sum of squared errors

  • Compute the sum of squared errors by squaring each of the error values in the error column, and computing the sum.
  • This would provide you with a single metric to determine how good the current model is with respect to its current weights.
  • Remember that we are trying to minimise the loss function.
  • For t=0t=0, the loss would typically be bad because we randomly initialised the weights.

8. Weight updates

  • Now that we have the δ\delta values we can perform the weight updates

new weight =wj+ηi=1n((tiMw(di))×dj,i)δ(D,wj)=w_j+\eta \underbrace{\sum_{i=1}^n\left(\left(t_i-\mathbb{M}_{\mathbf{w}}\left(\mathbf{d}_i\right)\right) \times \mathbf{d}_{j, i}\right)}_{\delta\left(\mathcal{D}, w_j\right)}

Assume the learning rate is α=0.000001\alpha =0.000001

We already know the current weights for time step 0, and we know all the δ\delta values.

W0(t=0)=0.146W_0(t=0)=-0.146 W1(t=0)=0.185W_1(t=0)=0.185 W3(t=0)=0.044W_3(t=0)=-0.044

w0(t=1)=w0(t=0)+α×δw0w_0(t=1) = w_0(t=0) + \alpha \times \delta w_0 Substituting the values: 0.146+0.000001×506.156=0.145493844-0.146 + 0.000001 \times 506.156 = -0.145493844

w1(t=1)=w1(t=0)+α×δw1w_1(t=1) = w_1(t=0) + \alpha \times \delta w_1 Substituting the values: 0.185+0.000001×267510.106=0.4525101060.185 + 0.000001 \times 267510.106 = 0.452510106

w2(t=1)=w2(t=0)+α×δw2w_2(t=1) = w_2(t=0) + \alpha \times \delta w_2 Substituting the values: 0.044+0.000001×2860.181=0.041139819-0.044 + 0.000001 \times 2860.181 = -0.041139819

W0(t=1)=0.145493844W_{0}(t=1)= -0.145493844 W1(t=1)=0.452510106W_{1}(t=1)= 0.452510106 W2(t=1)=0.041139819W_{2}(t=1)= -0.041139819

  • Now we could repeat all the steps again.
  • If you compute the sum of squared errors you will see that this value would have reduced (I.e. we minimised the loss).
  • We are trying to optimise the weights in such a was as to minimise the loss.
  • repeat all the steps and compute the weights for t=2t=2, and also ensure that the loss function has indeed decreased in time step t=1t=1when compared to time step t=0t=0.
IDf1f2targetpredictionerrordelta w0delta w1delta w2
15014320Compute again
25517380Compute again
..

© 2024 All rights reserved

Built with DataHub LogoDataHub Cloud