Training loss

See also: Machine learning terms

Introduction

In machine learning, training loss refers to a metric of how well a model fits training data. It is the model's loss during a particular training iteration. The aim of training a machine learning model is to find parameters that produce the optimal fit with given information. To evaluate how well it does this, we use something called a loss function - also known as cost function or error function.

The training loss is the outcome of evaluating a loss function on training set. It measures the difference between predicted values and actual values (labels) for that set. The aim of training is to minimize this loss so that the model can make accurate predictions on new, unseen data with confidence.

Types of Loss Functions

Machine learning employs a variety of loss functions, depending on the problem being solved and the model being employed. Some commonly employed loss functions include:

Mean squared error (MSE) Loss: This loss function is the most commonly employed for regression problems. It measures the average squared difference between predicted output and actual output.
Binary cross-entropy (BCE): Used in binary classification problems where the objective is to accurately predict one of two possible classes, this statistic measures the difference between the predicted probability of a positive class and actual binary label.
Categorical cross-entropy (CCE): Used in multiclass classification problems to predict one of several classes, this statistic measures the difference between a predicted probability distribution and an actual one-hot encoded class label.
Softmax Cross-Entropy Loss: This approach is used for multiclass classification problems with mutually exclusive classes. It calculates the categorical cross-entropy loss for each class and then takes its average across all classes.
KL-Divergence: This statistic measures the difference in probability distributions. It's commonly employed when training generative models such as Generative Adversarial Networks (GANs).

How Training Loss is Used

The training loss is used to evaluate the performance of a machine learning model during training. To minimize this loss, optimization algorithms such as stochastic gradient descent (SGD) or Adam are employed. These optimization processes modify the model's parameters in order to minimize its training loss.

Overfitting and Underfitting

Overfitting occurs when a model is too complex and fits the training data too well yet fails to generalize well on unseen data. This leads to low training losses but high validation losses. Regularization techniques like L1 or L2 regularization can be used as penalties for complex models in order to minimize overfitting.

Underfitting occurs when a model is too simple and does not fit the training data well, leading to high training loss and validation loss. To avoid this issue, more complex models can be utilized or more data can be collected for training the model.

Explain Like I'm 5 (ELI5)

Training loss is like a score that tells us how well our model is doing at guessing what we're teaching it. When teaching a model, we give it examples and ask it to guess the answer. The training loss is an indicator of how close these guesses are to reality - the lower the loss, the better! Just like when playing video games and getting high scores means you're doing well!