Gradient descent: Difference between revisions

Gradient descent (view source)

Revision as of 17:42, 23 February 2023

40 bytes added , 23 February 2023

no edit summary

Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators

7,785

edits

@@ Line 4: / Line 4: @@
 ==How Gradient Descent Works==
-Gradient descent works by iteratively altering the [[parameters]] of a model in order to obtain steepest descent of the [[cost function]], which measures how well it's performing. The goal of gradient descent is to find parameters that minimize this cost function.
+Gradient descent works by iteratively altering the [[parameters]] of a model in order to obtain the steepest descent of the [[cost function]], which measures how well it's performing. Gradient descent wants to find parameters that minimize this cost function.
-The algorithm begins with an initial set of parameters and iteratively updates them until it reaches a minimum point in the cost function. At each iteration, a gradient of the cost function is computed with respect to those same parameters; this gradient is represented as a vector pointing in the direction of steepest increase in cost function. To minimize this cost function, parameters are updated in direct opposition to that direction.
+The algorithm starts with an initial set of parameters and iteratively updates the parameters until it reaches a minimum point in the cost function. At each iteration, the gradient of the cost function is computed with respect to the parameters. This gradient is represented as a vector pointing in the direction of the steepest increase in the cost function. To minimize this cost function, parameters are updated in direct opposition to that direction.
-The update rule for parameters is determined by a learning rate, which controls the step size of each iteration. A small learning rate may lead to slow convergence while an excessively high one could cause overshooting at the minimum point.
+The update rule for parameters is determined by a [[learning rate]], which controls the step size of each iteration. A small learning rate may lead to slow [[convergence]], while an excessively high one could cause the model to [[overshoot]] the minimum point.
 ==Types of Gradient Descent==
-Gradient descent can be divided into three types: batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
+Gradient descent can be divided into three types: [[batch gradient descent]], [[stochastic gradient descent]], and [[mini-batch gradient descent]].
 ===Batch Gradient Descent===