Jump to content

Gradient descent: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 13: Line 13:
Gradient descent can be divided into three types: batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
Gradient descent can be divided into three types: batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.


===Batch Gradient Descent==
===Batch Gradient Descent===
Batch gradient descent updates the parameters after computing the gradient of a cost function over all training datasets. Although it can be computationally expensive for large datasets, it can converge to the global minimum in terms of this cost function.
Batch gradient descent updates the parameters after computing the gradient of a cost function over all training datasets. Although it can be computationally expensive for large datasets, it can converge to the global minimum in terms of this cost function.


===Stochastic Gradient Descent==
===Stochastic Gradient Descent===
Stochastic gradient descent updates the parameters after computing the gradient of a cost function for one training example. It is less computationally expensive than batch gradient descent, though it may converge to a local minimum in the cost function.
Stochastic gradient descent updates the parameters after computing the gradient of a cost function for one training example. It is less computationally expensive than batch gradient descent, though it may converge to a local minimum in the cost function.


===Mini-Batch Gradient Descent==
===Mini-Batch Gradient Descent===
Mini-batch gradient descent is a method for updating parameters after computing the gradient of a cost function for a small set of training examples. It offers an alternative to batch gradient descent and stochastic gradient descent, being less computationally intensive than batch gradient descent and capable of convergeng to either global minimums or local minima in the cost function.
Mini-batch gradient descent is a method for updating parameters after computing the gradient of a cost function for a small set of training examples. It offers an alternative to batch gradient descent and stochastic gradient descent, being less computationally intensive than batch gradient descent and capable of convergeng to either global minimums or local minima in the cost function.