Least squares regression
- See also: Machine learning terms
Introduction
In machine learning, Least Squares Regression is a well-established method for fitting a linear model to a set of data points. It seeks to minimize the sum of the squared differences between the observed values and the values predicted by the linear model. This technique is particularly useful in applications where the relationship between input features and the target variable is linear or near-linear. In this article, we will provide an overview of the least squares regression method, discuss its mathematical basis, and present its implementation in machine learning.
Mathematical Foundation
Ordinary Least Squares
The primary objective of the Least Squares method is to minimize the sum of the squared residuals, where a residual is the difference between the observed value and the value predicted by the model. Mathematically, the objective function for ordinary least squares (OLS) regression can be expressed as:
- Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \min_{\boldsymbol{\beta}} \sum_{i=1}^{n}(y_i - \boldsymbol{x}_i \boldsymbol{\beta})^2}
Here, Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n} is the number of data points, Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y_i} represents the observed values, Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{x}_i} is a row vector of input features, and Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{\beta}} is a column vector of model parameters to be estimated.
To find the optimal values of the parameters that minimize the objective function, we can compute the gradient with respect to Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{\beta}} and set it to zero. The resulting equation, called the normal equation, provides a closed-form solution for Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{\beta}} :
- Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{\beta} = (\boldsymbol{X}^T \boldsymbol{X})^{-1} \boldsymbol{X}^T \boldsymbol{y}}
Where Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{X}} is the matrix of input features and Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{y}} is the vector of observed values.
Gradient Descent
In some cases, the closed-form solution of the normal equation can be computationally expensive, particularly when the number of input features is large. An alternative approach to solving the least squares problem is Gradient Descent, an iterative optimization algorithm that adjusts the model parameters by moving in the direction of the steepest decrease in the objective function. The update rule for gradient descent in the context of least squares regression is:
- Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{\beta}_{t+1} = \boldsymbol{\beta}_t - \alpha \nabla_\boldsymbol{\beta} L(\boldsymbol{\beta}_t)}
Where Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{\beta}_t} is the current estimate of the parameters, Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha} is the learning rate, and Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \nabla_\boldsymbol{\beta} L(\boldsymbol{\beta}_t)} is the gradient of the objective function with respect to Failed to parse (SVG with PNG fallback (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \boldsymbol{\beta}} at the current parameter estimates.
Applications in Machine Learning
Least squares regression has widespread applications in machine learning due to its simplicity and effectiveness in modeling linear relationships. It is often employed as a baseline method to compare the performance of more complex models. Common applications of least squares regression in machine learning include:
- Predictive modeling: Least squares regression can be used to make predictions about the target variable based on new input data.
- Feature selection: By analyzing the significance of model parameters, least squares regression can aid in identifying important features for further modeling.
- Model evaluation: Performance metrics such as the coefficient of determination (<math>R