- learning process where the network gets feedback on its performance, figures out how to improve, and adjusts itself accordingly.
- Forward Pass: feed input data into the network, and it goes through each layer, making predictions.
- Compare with Reality: Compare the network's prediction with the actual correct answer. Calculate how far off the network is.
- Backward Pass (Backpropagation): This is where the network learns from its mistakes. It works backward through the layers to figure out how much each weight contributed to the error.
- Adjust Weights: Update the weights in the network to reduce the error. If a weight contributed a lot to the error, the network adjusts it more. If a weight didn't contribute much, it's adjusted less.
- Repeat: Repeat this process—forward pass, compare, backward pass, adjust weights—multiple times until the network gets really good at making accurate predictions.
- Initialization (Lines 1-2): - The algorithm assumes a dataset is available. - Requires hyper-parameters: learning rate , batch size . - A convergence criterion is specified to decide when to stop training.
- Mini-Batch Splitting (Line 1): - The data is split into mini-batches, where is a matrix of descriptive features, and is a matrix (or vector) containing labels for each example in mini-batch .
- Weight Initialization (Line 2): - Weight matrices for each layer are initialized.
- Epochs and Mini-Batch Processing (Lines 3-33): - Each iteration of the repeat loop represents an epoch (a full traversal of the training data). - The for loop processes each mini-batch, including a forward pass, backward pass, and weight updates.
- Forward Pass (Lines 5-11): - Descriptive features are presented to the input layer. - The forward pass involves propagating activations through the network. - Matrix operations are used to calculate activations at each layer using weight matrices and activation functions.
- Backward Pass (Lines 12-30): - The algorithm performs backpropagation to calculate error gradients. - Separate for loops handle output layer neurons (Lines 16-18) and hidden layer neurons (Lines 19-23). - Error gradients are accumulated for each weight across all examples in the mini-batch.
- Weight Updates (Lines 28-30): - The weights of the network are updated based on the accumulated error gradients.
- Shuffling Mini-Batch Sequence (Line 32): - Between epochs, the mini-batch sequence is shuffled.
- training through multiple epochs, where each epoch involves processing mini-batches of data.
- The forward pass computes activations
- the backward pass calculates error gradients
- Weight updates are then applied,
- and the process is repeated until the convergence criterion is met.
- The mini-batch sequence is shuffled between epochs to introduce randomness in the training process.