Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators
7,785
edits
No edit summary |
No edit summary |
||
Line 4: | Line 4: | ||
==How Gradient Descent Works== | ==How Gradient Descent Works== | ||
Gradient descent works by iteratively altering the [[parameters]] of a model in order to obtain steepest descent of the [[cost function]], which measures how well it's performing. | Gradient descent works by iteratively altering the [[parameters]] of a model in order to obtain the steepest descent of the [[cost function]], which measures how well it's performing. Gradient descent wants to find parameters that minimize this cost function. | ||
The algorithm | The algorithm starts with an initial set of parameters and iteratively updates the parameters until it reaches a minimum point in the cost function. At each iteration, the gradient of the cost function is computed with respect to the parameters. This gradient is represented as a vector pointing in the direction of the steepest increase in the cost function. To minimize this cost function, parameters are updated in direct opposition to that direction. | ||
The update rule for parameters is determined by a learning rate, which controls the step size of each iteration. A small learning rate may lead to slow convergence while an excessively high one could cause | The update rule for parameters is determined by a [[learning rate]], which controls the step size of each iteration. A small learning rate may lead to slow [[convergence]], while an excessively high one could cause the model to [[overshoot]] the minimum point. | ||
==Types of Gradient Descent== | ==Types of Gradient Descent== | ||
Gradient descent can be divided into three types: batch gradient descent, stochastic gradient descent, and mini-batch gradient descent. | Gradient descent can be divided into three types: [[batch gradient descent]], [[stochastic gradient descent]], and [[mini-batch gradient descent]]. | ||
===Batch Gradient Descent=== | ===Batch Gradient Descent=== |