Gradient descent: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 23: Line 23:


==Regularization==
==Regularization==
Gradient descent can also be enhanced with regularization techniques, which reduce overfitting and enhance the generalization of the model. Regularization techniques like L1 or L2 regularization add a penalty term to the cost function that penalizes large parameter values; this encourages models to use smaller parameter values while helping prevent overfitting.
Gradient descent can also be enhanced with [[regularization]] techniques, which reduce overfitting and enhance the generalization of the model. Regularization techniques like [[L1 regularization|L1]] or [[L2 regularization|L2]] add a penalty term to the cost function that penalizes large parameter values; this encourages models to use smaller parameter values while helping prevent overfitting.
 
==Explain Like I'm 5 (ELI5)==
Gradient descent is a technique for teaching your computer program how to learn by itself. It works like a treasure hunt where the program tries to find the treasure and uses that as its model for making accurate predictions. It starts by making an educated guess as to where it might be and looks for clues that can guide it closer. Gradient descent keeps moving in the direction of these clues until it finds the treasure - this treasure being the best way of making accurate predictions, while clues represent errors made while trying to predict something else.


==Explain Like I'm 5 (ELI5)==
==Explain Like I'm 5 (ELI5)==