Training loss: Difference between revisions

m
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 9: Line 9:


#[[Mean squared error]] (MSE) Loss: This loss function is the most commonly employed for [[regression]] problems. It measures the average squared difference between predicted output and actual output.
#[[Mean squared error]] (MSE) Loss: This loss function is the most commonly employed for [[regression]] problems. It measures the average squared difference between predicted output and actual output.
#[[Binary cross-entropy]] (BCE): Used in [[binary classification]] problems where the objective is to accurately predict one of two possible [[class]]es, this statistic measures the difference between the predicted probability of a [[positive class]] and actual [[binary label]].
#[[Binary cross-entropy]] (BCE): Used in [[binary classification]] problems where the objective is to accurately predict one of two possible [[class]]es, this statistic measures the distance between the predicted probabilities and the actual target values.
#[[Categorical cross-entropy]] (CCE): Used in multiclass classification problems to predict one of several classes, this statistic measures the difference between a predicted probability distribution and an actual one-hot encoded class label.
#[[Categorical cross-entropy]] (CCE): Used in [[multiclass classification]] problems to predict one of several classes, this statistic measures the difference between a [[predicted probability distribution]] and an actual [[one-hot encoded]] class [[label]].
#Softmax Cross-Entropy Loss: This approach is used for multiclass classification problems with mutually exclusive classes. It calculates the categorical cross-entropy loss for each class and then takes its average across all classes.
#[[Softmax cross-entropy]]: This approach is used for multiclass classification problems with mutually exclusive classes. It calculates the [[categorical cross-entropy]] loss for each class and then takes its average across all classes.
#KL-Divergence: This statistic measures the difference in probability distributions. It's commonly employed when training generative models such as Generative Adversarial Networks (GANs).
#[[KL-divergence]]: This statistic measures the difference in probability distributions. It's commonly employed when training [[generative models]] such as [[Generative Adversarial Network]]s (GANs).


==How Training Loss is Used==
==How Training Loss is Used==
The training loss is used to evaluate the performance of a machine learning model during training. To minimize this loss, optimization algorithms such as stochastic gradient descent (SGD) or Adam are employed. These optimization processes modify the model's weights in order to minimize its training loss.
The training loss is used to evaluate the performance of a machine learning model during training. To minimize this loss, [[optimization algorithm]]s such as [[stochastic gradient descent]] (SGD) or [[Adam]] are employed. These optimization processes modify the model's [[parameters]] in order to minimize its training loss.


==Overfitting and Underfitting==
==Overfitting and Underfitting==
Line 26: Line 26:




[[Category:Terms]] [[Category:Machine learning terms]]
[[Category:Terms]] [[Category:Machine learning terms]] [[Category:not updated]]