Gradient descent: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 4: Line 4:


==How Gradient Descent Works==
==How Gradient Descent Works==
Gradient descent works by iteratively altering the [[parameters]] of a model in order to obtain steepest descent of the [[cost function]], which measures how well it's performing. The goal of gradient descent is to find parameters that minimize this cost function.
Gradient descent works by iteratively altering the [[parameters]] of a model in order to obtain the steepest descent of the [[cost function]], which measures how well it's performing. Gradient descent wants to find parameters that minimize this cost function.


The algorithm begins with an initial set of parameters and iteratively updates them until it reaches a minimum point in the cost function. At each iteration, a gradient of the cost function is computed with respect to those same parameters; this gradient is represented as a vector pointing in the direction of steepest increase in cost function. To minimize this cost function, parameters are updated in direct opposition to that direction.
The algorithm starts with an initial set of parameters and iteratively updates the parameters until it reaches a minimum point in the cost function. At each iteration, the gradient of the cost function is computed with respect to the parameters. This gradient is represented as a vector pointing in the direction of the steepest increase in the cost function. To minimize this cost function, parameters are updated in direct opposition to that direction.


The update rule for parameters is determined by a learning rate, which controls the step size of each iteration. A small learning rate may lead to slow convergence while an excessively high one could cause overshooting at the minimum point.
The update rule for parameters is determined by a [[learning rate]], which controls the step size of each iteration. A small learning rate may lead to slow [[convergence]], while an excessively high one could cause the model to [[overshoot]] the minimum point.


==Types of Gradient Descent==
==Types of Gradient Descent==
Gradient descent can be divided into three types: batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
Gradient descent can be divided into three types: [[batch gradient descent]], [[stochastic gradient descent]], and [[mini-batch gradient descent]].


===Batch Gradient Descent===
===Batch Gradient Descent===