Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators
7,785
edits
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
{{see also|Machine learning terms}} | {{see also|Machine learning terms}} | ||
==Introduction== | ==Introduction== | ||
Generalization curve is a plot that displays [[training loss]] and [[validation loss]] as a function of the number of [[iterations]]. Iterations would be the x-axis while loss would be the y-axis. | [[Generalization curve]] is a plot that displays [[training loss]] and [[validation loss]] as a function of the number of [[iterations]]. Iterations would be the x-axis while [[loss]] would be the y-axis. | ||
Machine learning strives to build models that accurately predict unseen data. To do this, machine learning models are trained on a dataset | [[Machine learning]] strives to build [[models]] that accurately predict unseen [[data]]. To do this, [[machine learning models]] are trained on a [[dataset]] called [[training set]] that consists of input [[features]] and their corresponding target values ([[labels]]). Unfortunately, the performance of the model on this training dataset does not guarantee its performance when faced with new information - known as [[overfitting]]. To address this issue, [[evaluation]] of the model's performance on another dataset called the [[validation set]] must take place. | ||
The generalization curve is a graph that displays the model's performance on training and validation sets as a function of model complexity. It can be used to identify the optimal level of complexity that balances bias-variance tradeoff while avoiding overfitting. | The [[generalization curve]] is a graph that displays the model's performance on training and validation sets as a function of [[model complexity]]. It can be used to identify the optimal level of [[complexity]] that balances [[bias-variance tradeoff]] while avoiding overfitting. | ||
==The Bias-Variance Trade-Off== | ==The Bias-Variance Trade-Off== | ||
Machine learning models use the bias-variance | Machine learning models use the [[bias-variance tradeoff]] as a fundamental concept that describes the relationship between their ability to fit [[training data]] and generalize to new data. Model [[bias]] refers to any [[error]]s caused by assumptions made in the model, while [[variance]] refers to model sensitivity to fluctuations in training data. A model with high bias will not fit training data well while one with high variance overfits it by too much. | ||
The bias-variance trade-off can be illustrated by the generalization curve, which displays the model's performance on training and validation sets as a function of model complexity. Model complexity indicates how flexible the model is; typically, this number of parameters determines its flexibility. | The bias-variance trade-off can be illustrated by the generalization curve, which displays the model's performance on training and validation sets as a function of model complexity. Model complexity indicates how flexible the model is; typically, this number of [[parameters]] determines its flexibility. | ||
==Generalization Curve== | ==Generalization Curve== | ||
The generalization curve is a graph that displays the model's performance on training and validation sets as a function of model complexity. The x-axis represents model complexity, while the y-axis displays its performance on both sets. Accuracy, mean squared error, or area under the curve are typically used to measure this phenomenon. | The generalization curve is a graph that displays the model's performance on training and validation sets as a function of model complexity. The x-axis represents model complexity or iterations, while the y-axis displays its performance on both sets. [[Accuracy]], [[mean squared error]], or [[area under the curve]] are typically used to measure this phenomenon. | ||
The generalization curve can be used to identify the optimal model complexity that balances bias-variance tradeoff. A model with low complexity has high bias but low variance, while one with high complexity has both low bias and high variance. The optimal level of complexity will be identified as the point on the generalization curve where validation error is lowest. | The generalization curve can be used to identify the optimal model complexity that balances bias-variance tradeoff. A model with low complexity has high bias but low variance, while one with high complexity has both low bias and high variance. The optimal level of complexity will be identified as the point on the generalization curve where [[validation error]] is lowest. | ||
The generalization curve can also be used to detect overfitting, which occurs when a model exhibits low training error but high validation error. Overfitting may be caused by either too complex a model or by having too small of a dataset. By recognizing when your model starts overfitting, you may be able to select an easier alternative or collect more data | The generalization curve can also be used to detect overfitting, which occurs when a model exhibits low [[training error]] but high validation error. Overfitting may be caused by either too complex a model or by having too small of a dataset. By recognizing when your model starts overfitting, you may be able to select an easier alternative or collect more data. | ||
==Explain Like I'm 5 (ELI5)== | ==Explain Like I'm 5 (ELI5)== |