# QUESTION 1

# QUESTION 1

**Explain why the threshold θ is necessary. What is the effect of θ, and what will the consequences be of not having a threshold?**
The threshold (θ) in an artificial neuron sets a minimum level of activation required for the neuron to produce an output. It allows the neuron to distinguish between and to filter out weaker/less meaningful signals and noise, and respond selectively to more significant inputs.
Without a threshold, the neuron would be too sensitive, responding to any non-zero input.

# QUESTION 2

**Explain what the effects of weight changes are on the separating hyperplane.**
*The separating hyperplane is the boundary that distinguishes between different classes or states based on the input signals.*
When weights are adjusted, it changes the contribution of each input signal to the net input signal. This adjustment can rotate, shift, or tilt the hyperplane. In the context of linearly separable functions, such as in the case of a perceptron, changing weights allows the algorithm to find the optimal hyperplane that correctly classifies the input patterns.
Thus, weight changes modify the position and orientation of the separating hyperplane, influencing the neuron's ability to classify input patterns correctly.

# QUESTION 3

**Explain the effect of changing θ on the hyperplane that forms the decision boundary.**
Changing θ shifts the position of the decision boundary along the range of net input values. It determines the sensitivity of the neuron to different levels of input, influencing which input patterns will trigger a response above the threshold and which will not (influencing how the artificial neuron classifies input patterns).

*The direction of the shift* depends on the sign of $\sum_{i=1}^I z_i v_i-\theta$.

- If this expression is positive, decreasing θ will move the decision boundary closer to the positive side,
- and if it's negative, increasing θ will move the decision boundary closer to the negative side.

How changing θ affects the decision boundary:

**Increasing θ:**- If you increase θ, the decision boundary shifts towards the direction of smaller net input values.
- The region associated with an above-threshold response expands, while the region associated with a below-threshold response contracts.

**Decreasing θ:**- If you decrease θ, the decision boundary shifts towards the direction of larger net input values.
- The region associated with an above-threshold response contracts, while the region associated with a below-threshold response expands.

**θ = 0:**- When θ is set to zero, the decision boundary is aligned such that the net input must be positive for an above-threshold response and negative for a below-threshold response.

# QUESTION 4

**Which of the following Boolean functions can be realized with a single neuron that implements a SU? Justify your answer by giving weight and threshold values.**
*where $z_1 z_2$ denotes $(z_1 \text{ AND } z_2)$; $z_1+z_2$ denotes $(z_1 \text{ OR } z_2)$ ; $\bar{z}_1$ denotes $(\text{NOT }z_{1})$*
**(a) $z_1 z_2 \bar{z}_3$
(b) $z_1 \bar{z}_2+\bar{z}_1 z_2$
(c) $z_1+z_2$**

- consider conditions for linear separability
- find suitable weight and threshold values. An SU can realize linearly separable functions without any error. The decision boundary is a hyperplane that separates the space of input vectors into regions yielding an above-threshold response and those having a below-threshold response.

Let's analyse each Boolean function: (a) $z_1 z_2 \bar{z}_3$ (AND operation)

- This function is linearly separable and can be realized with an SU.
- Suitable weight and threshold values:
- $v_1 = 1, v_2 = 1, v_3 = -1$
- $\theta=0$

(b) $z_1 \bar{z}_2+\bar{z}_1 z_2$ (XOR operation)

- XOR is not linearly separable and cannot be realized with a single SU.

(c) $z_1 + z_2$ (OR operation)

- This function is linearly separable and can be realized with an SU.
- Suitable weight and threshold values:
- $v_{1}=1,v_{2}=1$
- $\theta=0$

THEREFORE: (a) yes, (b) no, (c) yes.

# Question 5

**Is it possible to use a single PU to learn problems that are not linearly separable?**
Yes.
For linearly separable problems, SUs are often sufficient, as they can create a hyperplane that separates the input space into distinct regions. However, for problems that are not linearly separable, a single PU provides the advantage of capturing more complex relationships among input features / Unlike Summation Units (SUs), Product Units allow higher-order combinations of inputs, enabling the representation of nonlinear decision boundaries.

# QUESTION 6

**In the calculation of error, why is the error per pattern squared?**

- Squaring amplifies the effect of larger errors, beneficial when training neural networks because it ensures that significant deviations from the target values have a more pronounced impact on the overall error. This is particularly important in situations where large errors should be penalized more heavily than smaller errors.
- Mathematical convenience: when taking derivatives or performing other mathematical manipulations, having a squared error term leads to simpler expressions. For example, the derivative of the squared error with respect to network parameters often results in cleaner and more tractable equations.
- Squaring ensures that the error is always positive or zero. This is advantageous when dealing with optimization algorithms, as it avoids issues associated with signed errors that could cancel each other out during the learning process.
- Least Squares Optimization: minimizing the sum of squared errors is equivalent to finding the parameters that provide the best fit to the data in a least squares sense.

# QUESTION 7

**Can errors be calculated as $|t_{p}-o_{p}|$ instead of $(t_{p}-o_{p})^2$ if gradient descent is used to adjust weights?**
Yes, because of the following:

- Use in gradient descent = absolute error loss function. Gradient is either -1 or 1; makes the optimization process less sensitive to the magnitude of errors compared to squared error.
- Absolute error is less sensitive to outliers. Use in datasets with significant outliers since it does not heavily penalize large errors.
- In optimization algorithms (especially gradient descent), convergence may be achieved faster using absolute error, but it could also lead to oscillations around the optimal solution.
- Disadvantage to consider: Squared error affects smoothness of the associated loss surface, which can aid optimization algorithms, which absolute error can't do.

# QUESTION 8

**Is the following statement true or false: 'A single neuron can be used to approximate the function $f(z)=z^2$'? Justify your answer.**
TRUE.
$f(z) = z^2$ is a quadratic function, and a single neuron with an appropriate set of weights and bias can represent quadratic functions.
In this case, the neuron could have a linear activation function and appropriate weights such that the output of the neuron is proportional to the square of the input.