Jump to content

Batch size: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 7: Line 7:


==Batch Size and Gradient Descent==
==Batch Size and Gradient Descent==
Gradient descent relies on batch size as a key parameter that determines how many examples are used in each iteration of the algorithm. Gradient descent works by iteratively updating model parameters in order to minimize cost function costs by computing gradients between them and batch size. Each iteration uses samples that make up this gradient calculation.
[[Gradient descent]] relies on the batch size as a key parameter that determines how many samples are used in each iteration of the algorithm. Gradient descent works by iteratively updating model [[parameters]] in order to minimize [[cost function]] costs by computing gradients between them and batch size. Each iteration uses samples that make up this gradient calculation.


==Batch Size and Stochastic Gradient Descent==
==Batch Size and Stochastic Gradient Descent==
[[Stochastic Gradient Descent]] (SGD) is a variant of gradient descent that uses only one sample to update model parameters each iteration. Compared to batch gradient descent, which utilizes all samples for this task, SGD requires only one sample for parameter updating - making it both computationally more efficient but also prone to noise in gradient estimates.
[[Stochastic Gradient Descent]] (SGD) is a variant of gradient descent that uses only one example to update model parameters each iteration. Compared to batch gradient descent, which utilizes all examples for this task, SGD requires only one example for parameter updating - making it both computationally more efficient and faster but also prone to noise in gradient estimates.


==Explain Like I'm 5 (ELI5)==
==Explain Like I'm 5 (ELI5)==