Least squares regression

See also: Machine learning terms

Introduction

In machine learning, Least Squares Regression is a well-established method for fitting a linear model to a set of data points. It seeks to minimize the sum of the squared differences between the observed values and the values predicted by the linear model. This technique is particularly useful in applications where the relationship between input features and the target variable is linear or near-linear. In this article, we will provide an overview of the least squares regression method, discuss its mathematical basis, and present its implementation in machine learning.

Mathematical Foundation

Ordinary Least Squares

The primary objective of the Least Squares method is to minimize the sum of the squared residuals, where a residual is the difference between the observed value and the value predicted by the model. Mathematically, the objective function for ordinary least squares (OLS) regression can be expressed as:

\min _{\boldsymbol {\beta }}\sum _{i=1}^{n}(y_{i}-{\boldsymbol {x}}_{i}{\boldsymbol {\beta }})^{2}

Here, $n$ is the number of data points, $y_{i}$ represents the observed values, ${\boldsymbol {x}}_{i}$ is a row vector of input features, and ${\boldsymbol {\beta }}$ is a column vector of model parameters to be estimated.

To find the optimal values of the parameters that minimize the objective function, we can compute the gradient with respect to ${\boldsymbol {\beta }}$ and set it to zero. The resulting equation, called the normal equation, provides a closed-form solution for ${\boldsymbol {\beta }}$ :

{\boldsymbol {\beta }}=({\boldsymbol {X}}^{T}{\boldsymbol {X}})^{-1}{\boldsymbol {X}}^{T}{\boldsymbol {y}}

Where ${\boldsymbol {X}}$ is the matrix of input features and ${\boldsymbol {y}}$ is the vector of observed values.

Gradient Descent

In some cases, the closed-form solution of the normal equation can be computationally expensive, particularly when the number of input features is large. An alternative approach to solving the least squares problem is Gradient Descent, an iterative optimization algorithm that adjusts the model parameters by moving in the direction of the steepest decrease in the objective function. The update rule for gradient descent in the context of least squares regression is:

{\boldsymbol {\beta }}_{t+1}={\boldsymbol {\beta }}_{t}-\alpha \nabla _{\boldsymbol {\beta }}L({\boldsymbol {\beta }}_{t})

Where ${\boldsymbol {\beta }}_{t}$ is the current estimate of the parameters, $\alpha$ is the learning rate, and $\nabla _{\boldsymbol {\beta }}L({\boldsymbol {\beta }}_{t})$ is the gradient of the objective function with respect to ${\boldsymbol {\beta }}$ at the current parameter estimates.

Applications in Machine Learning

Least squares regression has widespread applications in machine learning due to its simplicity and effectiveness in modeling linear relationships. It is often employed as a baseline method to compare the performance of more complex models. Common applications of least squares regression in machine learning include:

Predictive modeling: Least squares regression can be used to make predictions about the target variable based on new input data.
Feature selection: By analyzing the significance of model parameters, least squares regression can aid in identifying important features for further modeling.
Model evaluation: Performance metrics such as the coefficient of determination (<math>R