Mini-batch: Difference between revisions

1,066 bytes removed ,  28 February 2023
no edit summary
No edit summary
No edit summary
Line 1: Line 1:
{{see also|Machine learning terms}}
{{see also|Machine learning terms}}
==Introduction==
==Introduction==
In [[machine learning]], [[mini-batch]] is when you randomly divide a [[dataset]] into smaller [[batch]]es during [[training]]. The [[model]] only trains on these mini-batches during each [[iteration]] instead of the entire dataset. The number of [[example]]s ([[data points]]) in a mini-batch is called the [[batch size]]. This division of the entire dataset into smaller batches allows for faster training as well as faster [[convergence]]. It is also far more efficient to calculate the [[loss]] on the mini-batch of examples than the entire dataset.
In [[machine learning]], [[mini-batch]] is when you randomly divide a [[dataset]] into smaller [[batch]]es during [[training]]. The [[model]] only trains on these mini-batches during each [[iteration]] instead of the entire dataset. The number of [[example]]s ([[data points]]) in a mini-batch is called the [[batch size]]. This division of the entire dataset into smaller batches allows for faster training as well as faster [[convergence]]. Calculating the [[loss]] on the mini-batch of examples is also far more efficient than the entire dataset.


==Theoretical Background==
==Theoretical Background==
Traditional machine learning relies on batch gradient descent to train the model on all data in one iteration. Unfortunately, when the dataset grows large, this approach becomes computationally expensive and may take a long time to converge to an optimal solution.
Traditional machine learning relies on [[batch]] [[gradient descent]] to train the model on all data in one iteration. Unfortunately, when the dataset grows large, this approach becomes computationally expensive and may take a long time to [[converge]] to an optimal solution.


Mini-batch gradient descent solves this issue by breaking up the entire dataset into smaller groups or batches. The model is then trained on these smaller samples instead of the entire dataset, allowing it to learn from a portion of data instead of all of it, which makes the training process faster.
Mini-batch gradient descent solves this issue by breaking up the entire dataset into smaller groups or batches. The model is then trained on these smaller datasets instead of the entire dataset, allowing it to learn from a portion of data instead of all of it, which makes the training process faster.


Additionally, mini-batch gradient descent has been demonstrated to offer faster convergence compared to batch gradient descent. This is because mini-batch updates the model parameters after each batch, leading to faster convergence and a more reliable solution.
Additionally, mini-batch gradient descent has been demonstrated to offer faster convergence compared to batch gradient descent. This is because mini-batch updates the model [[parameters]] after each batch.


==The Mini-batch Algorithm==
==Mini-batch Steps==
The mini-batch algorithm functions as follows:
The mini-batch algorithm functions as follows:
1. Divide the entire dataset into smaller batches 2. Initialize model parameters 3. Loop over batches:
#Randomly divide the entire dataset into smaller batches  
1. Calculate the gradient of a loss function with respect to model parameters using current batch of data
#Initialize model parameters  
b. Update model parameters using gradient information
#Loop over the batches:
4. Repeat until either convergence occurs or maximum iterations have been completed.
##Compute the [[gradient]] of the [[loss function]] with respect to the model parameters using the current batch of data
##Update the model parameters using the gradient information
#Repeat the process until the model converges or a maximum number of iterations is reached


==Batch Size Selection==
==Batch Size Selection==
The size of a mini-batch is an important parameter that can influence the training process. A smaller batch size may lead to faster convergence but also higher variance in model parameters; conversely, larger batches offer slower convergence but also lower variance levels.
The size of a mini-batch is an important [[hyperparameter]] that can influence the training process. A smaller batch size may lead to faster convergence but also higher [[variance]] in model parameters; conversely, larger batches offer slower convergence but also lower variance.


Typically, batch size is chosen based on a tradeoff between convergence speed and stability. A range of 32 to 128 is commonly observed in practice.
Typically, [[batch size]] is chosen based on a tradeoff between convergence speed and [[stability]]. A range of 10 to 1000 is commonly observed in practice.
 
==What is mini-batch in machine learning?==
Mini-batch is a subset of a dataset that contains only a limited number of input/output examples used for training a machine learning model. Mini-batches are usually selected randomly from the larger dataset, and their number of examples can range anywhere from several hundred or thousands.
 
Machine learning often employs mini-batch training, where models are trained on multiple mini-batches iteratively. Each epoch represents one mini-batch of data being trained on by the model; its weights are updated based on gradients computed for that mini-batch. This process continues until either the desired accuracy level is achieved or an unlimited number of epochs have passed.
 
The size of a mini-batch can have an immense effect on the performance of a model. A small batch size may make learning more difficult, leading to slower convergence; conversely, larger mini-batch sizes require more memory and slow down training considerably. Therefore, selecting an optimal mini-batch size often depends on available computational resources as well as the characteristics of the dataset being processed.


==Advantages of mini-batch training==
==Advantages of mini-batch training==
Mini-batch training has several advantages over batch training, which involves running the model on all data at once. Some of these advantages include:
Mini-batch training has several advantages over [[batch training]], which involves running the model on all data at once. Some of these advantages include:


===Faster convergence===
===Faster convergence===
Mini-batch training typically converges faster than batch training, since the model is updated more frequently. This is because the gradients calculated on a mini-batch are more representative of the entire dataset than those calculated from individual data points.
Mini-batch training typically converges faster than batch training, since the model is updated more frequently. Additionally, the gradients calculated on a mini-batch are more representative of the entire dataset than those calculated from individual data points as in [[Stochastic Gradient Descent]].


===Less memory usage===
===Less memory usage===
Mini-batch training uses less memory than batch processing, as only a portion of the dataset is loaded into memory at once. This enables larger models to be trained using limited computational resources.
Mini-batch training uses less [[memory]] than batch processing, as only a portion of the dataset is loaded into memory at once. This enables larger models to be trained using limited computational resources.


===Improved generalization===
===Improved generalization===
Mini-batch training can improve generalization performance as the model is exposed to a wider variety of examples during instruction, making it more likely that it will learn how to apply its knowledge to new, unseen situations.
Mini-batch training can improve [[generalization]] performance as the model is exposed to a wider variety of examples during instruction, making it more likely that it will learn how to apply its knowledge to new, unseen situations.


===Better optimization===
===Better optimization===
Mini-batch training can aid the optimization process, as the stochastic nature of gradients computed from mini-batches can help the model escape local minima and find more favorable solutions.
Mini-batch training can aid the [[optimization]] process, as the stochastic nature of gradients computed from mini-batches can help the model escape [[local minima]] and find more favorable solutions.


==Disadvantages of mini-batch training==
==Disadvantages of mini-batch training==
Mini-batch training has its advantages, but also some drawbacks, such as:
Mini-batch training has its advantages but also some drawbacks, such as:


===Hyperparameter tuning===
===Hyperparameter tuning===
The selection of mini-batch size is an important parameter that needs to be optimized for optimal performance. This can be a laborious process requiring extensive experimentation.
The selection of [[mini-batch size]] is an important parameter that needs to be optimized for optimal performance. This can be a laborious process requiring extensive experimentation.


===Noise===
===Noise===
Mini-batch training can be noisy due to the approximations made when computing gradients from a mini-batch; this may lead to oscillations in the learning process and slower convergence speeds.
Mini-batch training can be [[noisy]] due to the approximations made when computing gradients from a mini-batch; this may lead to [[oscillation]]s in the learning process and slower convergence speeds.


===Hardware requirements===
===Hardware requirements===
Mini-batch training requires more CPU/GPU memory than training on a single data point, making it difficult to train large models with limited hardware resources.
Mini-batch training requires more CPU/GPU memory than training on a single data point like in [[SGD]], making it difficult to train large models with limited hardware resources.


==Explain Like I'm 5 (ELI5)==
==Explain Like I'm 5 (ELI5)==