Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is a classical statistical method for classification and dimensionality reduction that finds a linear combination of features which best separates two or more classes. It was introduced by the British statistician and geneticist Ronald A. Fisher in his 1936 paper "The Use of Multiple Measurements in Taxonomic Problems," where he applied it to the iris dataset compiled by botanist Edgar Anderson ^[1]. LDA serves two related purposes: as a classifier that assigns observations to one of several groups, and as a feature extraction technique that projects high dimensional data onto a lower dimensional subspace chosen to maximize between class separation ^[2]^[3].

Under the assumption that each class is drawn from a multivariate Gaussian distribution with a common covariance matrix, LDA coincides with the Bayes optimal classifier and produces linear decision boundaries. When that common covariance assumption is dropped, the related method Quadratic Discriminant Analysis produces quadratic boundaries. As a dimensionality reduction tool, LDA is conceptually distinct from Principal Component Analysis: PCA maximizes total variance and ignores class labels, while LDA seeks directions that separate the classes ^[4].

The abbreviation LDA is also used for Latent Dirichlet Allocation, an unrelated topic model introduced by Blei, Ng, and Jordan in 2003. The two methods share only the acronym. This article describes Fisher's Linear Discriminant Analysis.

History

Linear Discriminant Analysis originates with Ronald Fisher, who in 1936 published "The Use of Multiple Measurements in Taxonomic Problems" in the Annals of Eugenics ^[1]. Fisher discriminated between two species of iris using four floral measurements (sepal length, sepal width, petal length, petal width) collected by Edgar Anderson in the Gaspe Peninsula. His solution was a linear combination of the four measurements whose values for the two species were maximally separated relative to within species spread. The resulting linear function became Fisher's linear discriminant, and Anderson's data became the iris dataset, one of the most cited datasets in pattern recognition.

Fisher's original derivation was geometric: he maximized a ratio of between class to within class variance, with no explicit distributional assumption. The probabilistic interpretation, in which LDA arises as the Bayes optimal rule under a Gaussian model with shared covariance, came later and is now the standard textbook framing.

The extension to more than two classes was developed by C. R. Rao in 1948 in his paper "The Utilization of Multiple Measurements in Problems of Biological Classification" ^[5]. Rao introduced what is now called multiple discriminant analysis, replacing the single Fisher discriminant with a set of up to c-1 discriminant directions for c classes. This formulation gave LDA its modern matrix form involving the within class scatter matrix S_w and the between class scatter matrix S_b.

Through the 1950s and 1960s LDA became a workhorse in statistics, biology, and economics. Edward Altman's 1968 Z score for corporate bankruptcy used five financial ratios in a discriminant function and is the most famous applied LDA model ^[6]. In the 1990s LDA found a second life in computer vision through the Fisherfaces method of Belhumeur, Hespanha, and Kriegman, which combined PCA with LDA for face recognition ^[7]. More recently Probabilistic LDA became central to speaker verification systems built on i vectors and x vectors.

Mathematical formulation

LDA can be derived in two equivalent ways: as Fisher's variance ratio criterion, or as the Bayes optimal classifier under a Gaussian model with shared covariance. The two derivations yield the same projection direction and the same decision rule.

Two class case

Let the training data consist of feature vectors x ∈ R^d belonging to one of two classes. Denote the class conditional mean vectors as μ_1 and μ_2, and the class conditional covariance matrices as Σ_1 and Σ_2. Fisher assumed equal covariances: Σ_1 = Σ_2 = Σ_w, the common within class covariance.

Fisher's discriminant seeks a direction w ∈ R^d such that, when each point is projected to y = w^T x, the projected class means are as far apart as possible relative to the projected within class spread. This is Fisher's criterion:

J(w) = ( w^T (μ_1 - μ_2) )^2  /  ( w^T Σ_w w )

The numerator is the squared distance between the projected means, often called the between class variance along w. The denominator is the projected within class variance. Maximizing J(w) with respect to w gives the closed form solution

w  ∝  Σ_w^{-1} (μ_1 - μ_2)

This direction is called Fisher's linear discriminant. To turn it into a classifier, one chooses a threshold c and assigns a new point x to class 1 if w^T x > c and to class 2 otherwise. Under the equal Gaussian covariance assumption, the optimal threshold derived from Bayes theorem is

c = (1/2) w^T (μ_1 + μ_2)  -  log( π_1 / π_2 )

where π_1 and π_2 are the class priors. With equal priors the threshold lies halfway between the projected means.

Multi class case

For c classes, C. R. Rao generalized Fisher's criterion using two scatter matrices ^[5]. Let N_i be the number of training samples in class i, with class mean μ_i, and let μ be the overall mean. The within class scatter matrix is

S_w  =  Σ_{i=1}^{c}  Σ_{x ∈ class i}  (x - μ_i) (x - μ_i)^T

and the between class scatter matrix is

S_b  =  Σ_{i=1}^{c}  N_i (μ_i - μ) (μ_i - μ)^T

The total scatter satisfies S_t = S_w + S_b. The multi class objective is to find a projection matrix W ∈ R^{d × k} that maximizes a ratio of determinants or a trace ratio such as

W^*  =  argmax_W  tr( (W^T S_w W)^{-1} (W^T S_b W) )

The solution is given by the eigenvectors of S_w^{-1} S_b corresponding to its largest eigenvalues. Because S_b has rank at most c - 1 (the c class means span an affine subspace of dimension c - 1), there are at most c - 1 non zero generalized eigenvalues, and LDA can therefore reduce dimensionality to at most c - 1 features regardless of how large d is. For a binary problem this means LDA always projects to one dimension; for ten classes it can project to nine.

In practice the eigendecomposition of S_w^{-1} S_b is replaced by a numerically stable procedure: whiten the data with respect to S_w using Cholesky or singular value decomposition, then eigendecompose the transformed S_b. This avoids forming S_w^{-1} explicitly and is the route taken by most production implementations.

Probabilistic interpretation

LDA admits a clean probabilistic derivation as a generative classifier. Assume the class conditional density is multivariate Gaussian with mean μ_k and shared covariance Σ:

p(x | y = k)  =  (2π)^{-d/2} |Σ|^{-1/2} exp( -(1/2) (x - μ_k)^T Σ^{-1} (x - μ_k) )

Let π_k = P(y = k) denote the class prior. By Bayes theorem the posterior is p(y = k | x) ∝ π_k p(x | y = k). Taking the logarithm and dropping terms that do not depend on the class label gives the discriminant function

δ_k(x)  =  x^T Σ^{-1} μ_k  -  (1/2) μ_k^T Σ^{-1} μ_k  +  log π_k

The decision rule assigns x to the class with the largest δ_k(x). Notice that the quadratic term x^T Σ^{-1} x cancels because the covariance is shared across classes, so each δ_k is linear in x. The set of points where δ_k(x) = δ_l(x) is therefore a hyperplane, which is the geometric reason LDA produces linear decision boundaries.

This derivation also shows that LDA is the Bayes optimal classifier under its assumptions: if the data really are Gaussian with shared covariance and the priors are correct, no other classifier can achieve lower expected error. In practice the assumptions are rarely exact, but LDA still performs well as a low variance estimator when training data is limited.

The parameters are estimated from training data by maximum likelihood:

π_k    =  N_k / N
μ_k    =  (1 / N_k)  Σ_{x in class k}  x
Σ       =  (1 / (N - c))  Σ_{k=1}^{c}  Σ_{x in class k}  (x - μ_k)(x - μ_k)^T

The pooled covariance estimator divides by N - c for an unbiased estimate.

LDA vs QDA

Quadratic Discriminant Analysis (QDA) shares LDA's generative Gaussian framework but allows each class to have its own covariance matrix Σ_k. The discriminant function becomes

δ_k(x)  =  -(1/2) log |Σ_k|  -  (1/2) (x - μ_k)^T Σ_k^{-1} (x - μ_k)  +  log π_k

which is quadratic in x. The decision boundary between any two classes is a quadric surface (an ellipsoid, hyperboloid, paraboloid, or pair of hyperplanes) rather than a hyperplane.

Property	LDA	QDA
Covariance assumption	Single shared `Σ`	Per class `Σ_k`
Decision boundary	Linear (hyperplane)	Quadratic (quadric)
Number of covariance parameters	`d(d+1)/2`	`c · d(d+1)/2`
Bias	Higher when covariances really differ	Lower
Variance	Lower (fewer parameters)	Higher (more parameters)
Typical regime where it wins	Small `N`, similar covariances	Large `N`, clearly different covariances
Required minimum samples per class	Few	At least `d + 1` per class for invertible `Σ_k`
Reduces dimensionality	Yes (to `c - 1`)	No native reduction

LDA is preferred when training data is scarce or when classes have similar shapes; QDA is preferred when there is enough data per class to estimate per class covariances reliably and those covariances clearly differ. Regularized Discriminant Analysis (RDA), introduced by Friedman in 1989, interpolates smoothly between LDA, QDA, and a fully diagonal model via two tuning parameters ^[8].

LDA vs PCA

Principal Component Analysis is the most common alternative dimensionality reduction technique and is sometimes confused with LDA. Both produce linear projections, both involve eigendecompositions, and both are heavily used as preprocessing steps. They differ in their objectives.

Property	PCA	LDA
Supervised	No, ignores labels	Yes, uses class labels
Objective	Maximum total variance	Maximum class separability
Eigenproblem	`Cov(X)`	`S_w^{-1} S_b`
Output dimensionality	Up to `min(N, d)`	At most `c - 1`
Useful for	Compression, denoising, visualization	Class discrimination, feature extraction for classification
Robust to label noise	Yes	No, depends directly on labels

A common pitfall: PCA can discard precisely the directions that distinguish classes if those directions have small variance compared to within class spread. LDA avoids this because its objective explicitly references the class structure. Conversely PCA is valuable when class labels are unavailable, noisy, or when the goal is data exploration.

It is common to combine the two. The Fisherfaces method ^[7] runs PCA first to reduce dimension below the sample size, ensuring S_w is invertible, then applies LDA in the PCA subspace. This pipeline is a robust default in genomics and small sample image problems.

LDA vs logistic regression

Logistic regression is the classic discriminative counterpart to LDA. Both produce linear decision boundaries in the two class case, but they differ in how those boundaries are estimated.

Under the equal covariance Gaussian model, LDA and logistic regression both yield log posteriors of the form log P(y = 1 | x) / P(y = 0 | x) = β_0 + β^T x. LDA estimates the coefficients indirectly, by fitting class means and a pooled covariance and then solving for β. Logistic regression estimates the coefficients directly by maximum likelihood on the conditional distribution P(y | x), without modeling p(x).

The practical implications are summarized in The Elements of Statistical Learning ^[4]. When the Gaussian assumption holds, LDA is more efficient: its variance is roughly 30 percent lower than logistic regression's at the maximum likelihood estimate. When the Gaussian assumption is badly violated, for instance when features are categorical or heavy tailed, logistic regression tends to be more robust because it makes no assumption about the input distribution. On high dimensional problems with text or count features, regularized logistic regression usually outperforms LDA.

LDA handles multi class classification natively through the log linear discriminants δ_k(x), whereas logistic regression must be extended via a softmax (multinomial logistic) formulation. Both methods admit regularization, though regularized LDA via shrinkage of the covariance matrix is less commonly taught than ridge or lasso logistic regression.

Assumptions and limitations

LDA's strong performance depends on several assumptions that should be checked before applying it:

Multivariate Gaussian class conditional densities. Each class is assumed to be approximately normal in the feature space. Skewed or heavy tailed data can degrade the model. Power transforms or rank based features sometimes help.
Equal covariance across classes. This is the assumption that makes the decision boundary linear. When violated, QDA or RDA is preferable.
Continuous features. Categorical or binary features violate the Gaussian assumption. Naive Bayes with appropriate likelihoods, or logistic regression, are usually better fits.
Sufficient samples per class. With fewer samples than features (N < d), the within class scatter S_w becomes singular and cannot be inverted. Remedies include shrinkage, regularization, or a PCA preprocessing step.
Outlier sensitivity. Class means and covariances are sensitive to outliers. Robust variants based on the minimum covariance determinant estimator help in the presence of contamination.
Linear separability in the projected space. If the optimal boundary is highly nonlinear, no linear projection will recover it; kernel LDA or nonlinear methods become necessary.

A common practical issue is singular within class scatter when N < d, the small sample size problem. The classical fixes are regularized LDA, replacing S_w with S_w + λI for some λ > 0; shrinkage LDA, using an analytically derived intensity such as the Ledoit Wolf or Oracle Approximating Shrinkage estimator; or PCA LDA, which projects to a subspace where S_w is non singular before applying LDA ^[7]^[8].

Variants and extensions

Many generalizations of LDA have been developed to address its limitations or to extend it to new settings.

Quadratic Discriminant Analysis (QDA). Drops the equal covariance assumption; produces quadratic boundaries.
Regularized Discriminant Analysis (RDA). Friedman 1989 ^[8]. Interpolates between LDA, QDA, and a diagonal model via two regularization parameters.
Diagonal LDA. Constrains Σ to be diagonal. Useful for high dimensional small sample problems such as microarray classification.
Shrinkage LDA. Replaces the sample covariance with a shrinkage estimator. The Ledoit Wolf and Oracle Approximating Shrinkage estimators provide closed form intensities and are the default in many libraries.
Fisherfaces. Belhumeur, Hespanha, and Kriegman 1997 ^[7]. PCA followed by LDA, applied to face images. A landmark method in early 2000s face recognition.
Kernel Fisher Discriminant (KFD). Mika et al. 1999 ^[9]. Performs LDA in a feature space defined by a kernel method, enabling nonlinear class boundaries. Closely related to support vector machines.
Generalized Discriminant Analysis (GDA). Baudat and Anouar 2000 ^[10]. Another formulation of kernel LDA with multi class extensions.
Local Fisher Discriminant Analysis (LFDA). Sugiyama 2006 ^[11]. Combines Fisher's criterion with locality preserving structure, useful when classes have multimodal distributions.
Heteroscedastic LDA (HLDA). Kumar and Andreou 1998. Allows class specific covariance for the discarded directions; popular in speech recognition.
Probabilistic LDA (PLDA). Prince and Elder 2007 ^[12]. A latent variable formulation widely used in face and speaker verification systems, modeling each observation as a latent identity vector plus within class noise.
Sparse LDA. Imposes an L1 penalty on the discriminant directions for interpretable variable selection.
Incremental and online LDA. Extends LDA to streaming data without recomputing the eigendecomposition from scratch.

Applications

LDA has been applied across many fields, often as a competitive baseline that is hard to beat without significantly more data or model complexity.

Bankruptcy prediction. Altman's 1968 Z score is the canonical financial application of LDA, using five accounting ratios to discriminate solvent from bankrupt firms ^[6]. It is still taught in finance and used as a screening tool. See Altman Z score.
Face recognition. Fisherfaces ^[7] use LDA on PCA reduced face images and led the field through the late 1990s and 2000s. Even after deep learning, LDA remains useful as a scoring layer on top of learned face embeddings.
Speaker verification. Modern speaker recognition pipelines extract i vectors or neural x vectors and score pairs with PLDA. LDA is also frequently used as a discriminative dimensionality reduction step before PLDA.
Genomics and bioinformatics. Diagonal or shrinkage LDA is a standard classifier for gene expression microarrays and other very high dimensional small sample problems.
Medical diagnosis. Discrimination between disease states using clinical biomarkers, biopsy measurements, or imaging features.
Brain computer interfaces and EEG. Shrinkage LDA is widely used in BCI research because EEG features are roughly Gaussian and training data is limited.
Marketing. Segmenting customers into known classes based on demographic and transactional features.
Chemometrics and remote sensing. Classifying products or land cover from spectroscopic or multispectral measurements.
Handwritten digit recognition. A historical benchmark; LDA was an early standard on MNIST style tasks before convolutional networks took over.

Implementations

Linear Discriminant Analysis is included in essentially every major statistical and machine learning package.

scikit-learn provides sklearn.discriminant_analysis.LinearDiscriminantAnalysis and QuadraticDiscriminantAnalysis. The LDA class supports SVD, eigen, and least squares solvers with built in Ledoit Wolf shrinkage.
R ships LDA in MASS::lda and MASS::qda. The klaR package adds klaR::rda for regularized discriminant analysis, and mda adds mixture discriminant analysis.
MATLAB provides fitcdiscr in the Statistics and Machine Learning Toolbox, supporting linear, quadratic, diagonal, and pseudoquadratic discriminants.
Stata provides discrim lda and discrim qda.
SAS provides PROC DISCRIM and PROC STEPDISC for variable selection.
SPSS offers Discriminant Analysis as a menu driven procedure, and Weka ships discriminant classifiers in its functions package.
Spark MLlib has no LDA classifier (its LDA class is for Latent Dirichlet Allocation), but linear algebra primitives can approximate one.

Worked example

Consider a simplified two class version of Fisher's iris dataset with two features, sepal length and sepal width, and two species, Iris setosa (class 1) and Iris versicolor (class 2). Suppose the estimated class means are μ_1 = (5.0, 3.4) for setosa and μ_2 = (5.9, 2.8) for versicolor, and the pooled within class covariance is

Σ_w = [ 0.20   0.05 ]
      [ 0.05   0.10 ]

The difference of means is μ_1 - μ_2 = (-0.9, 0.6). To find Fisher's direction, compute Σ_w^{-1} (μ_1 - μ_2). The inverse of the pooled covariance is approximately

Σ_w^{-1} ≈ [  5.71  -2.86 ]
            [ -2.86  11.43 ]

Multiplying out gives w ≈ (-6.86, 9.43) (up to rounding). To classify a new flower with measurements x = (5.5, 3.1), the LDA discriminant compares

δ_1(x) - δ_2(x)  =  w^T (x - (μ_1 + μ_2)/2)  +  log(π_1 / π_2)

With equal priors π_1 = π_2 = 0.5 and the midpoint (μ_1 + μ_2)/2 = (5.45, 3.10), the term (x - midpoint) = (0.05, 0.0), and w^T (x - midpoint) ≈ -6.86 · 0.05 + 9.43 · 0.0 ≈ -0.34. Since this is negative, the example is closer to the versicolor side and would be classified as Iris versicolor. (In Fisher's original paper using all four features the two species are linearly separable with zero training error, which is part of why the iris example became iconic.)

This calculation captures the essence of LDA. Training reduces to estimating means, a covariance, and class priors; inference reduces to a dot product and a threshold comparison. There are no iterative optimizers, learning rates, or randomness, which is part of LDA's enduring appeal as a baseline.

Modern relevance

LDA is no longer the leading method on the most demanding modern benchmarks. The rise of support vector machines in the late 1990s and of gradient boosting and deep learning in the 2010s pushed LDA out of the headline ML competitions, where the equal Gaussian assumption is rarely realistic.

Nevertheless, LDA continues to be used heavily and intentionally in several settings:

Speech and speaker recognition. PLDA scoring on x vector embeddings remains the dominant approach in many speaker verification systems, including call center authentication and forensic voice comparison.
Bankruptcy prediction and credit scoring. Altman's Z score and its descendants remain in active use, valued for interpretability and regulatory acceptability.
Brain computer interfaces. Shrinkage LDA is a standard classifier in motor imagery and P300 BCIs because of its stability with very few labeled trials.
High dimensional small sample problems. In genomics and chemometrics, regularized or diagonal LDA frequently matches or beats more complex models when data is scarce.
Education and benchmarking. LDA is one of the cleanest pedagogical examples of a generative classifier and the bias variance tradeoff, and it serves as a sanity check baseline.
Feature engineering. LDA's c - 1 projection is used as a low dimensional summary of class structure for visualization or as input to a downstream model.

LDA sits at the intersection of three traditions in classification: the geometric (Fisher's variance ratio), the probabilistic (Bayes optimal generative model), and the spectral (eigendecomposition of S_w^{-1} S_b). Even when LDA is not the final model, the concepts of feature extraction, discriminative projection, and class conditional generative modeling that grew out of it permeate contemporary statistical learning.

References

Fisher, R. A. (1936). "The Use of Multiple Measurements in Taxonomic Problems." *Annals of Eugenics*, 7(2), 179-188. https://onlinelibrary.wiley.com/doi/10.1111/j.1469-1809.1936.tb02137.x
Hastie, T., Tibshirani, R., & Friedman, J. (2009). *The Elements of Statistical Learning* (2nd ed.). Springer. Chapter 4. https://hastie.su.domains/ElemStatLearn/
Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*. Springer. Chapter 4. https://www.microsoft.com/en-us/research/people/cmbishop/prml-book/
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). *Pattern Classification* (2nd ed.). Wiley. https://www.wiley.com/en-us/Pattern+Classification%2C+2nd+Edition-p-9780471056690
Rao, C. R. (1948). "The Utilization of Multiple Measurements in Problems of Biological Classification." *Journal of the Royal Statistical Society, Series B*, 10(2), 159-203. https://www.jstor.org/stable/2983775
Altman, E. I. (1968). "Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy." *Journal of Finance*, 23(4), 589-609. https://onlinelibrary.wiley.com/doi/10.1111/j.1540-6261.1968.tb00843.x
Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). "Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection." *IEEE Transactions on Pattern Analysis and Machine Intelligence*, 19(7), 711-720. https://ieeexplore.ieee.org/document/598228
Friedman, J. H. (1989). "Regularized Discriminant Analysis." *Journal of the American Statistical Association*, 84(405), 165-175. https://www.tandfonline.com/doi/abs/10.1080/01621459.1989.10478752
Mika, S., Ratsch, G., Weston, J., Scholkopf, B., & Muller, K. R. (1999). "Fisher Discriminant Analysis with Kernels." *IEEE Neural Networks for Signal Processing IX Workshop*, 41-48. https://ieeexplore.ieee.org/document/788121
Baudat, G., & Anouar, F. (2000). "Generalized Discriminant Analysis Using a Kernel Approach." *Neural Computation*, 12(10), 2385-2404. https://direct.mit.edu/neco/article/12/10/2385/6385
Sugiyama, M. (2006). "Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction." *Proceedings of the 23rd International Conference on Machine Learning*, 905-912. https://dl.acm.org/doi/10.1145/1143844.1143958
Prince, S. J. D., & Elder, J. H. (2007). "Probabilistic Linear Discriminant Analysis for Inferences About Identity." *IEEE 11th International Conference on Computer Vision*, 1-8. https://ieeexplore.ieee.org/document/4409052
Welling, M. (2005). "Fisher Linear Discriminant Analysis." Tutorial, University of Toronto. https://www.ics.uci.edu/~welling/classnotes/papers_class/Fisher-LDA.pdf
McLachlan, G. J. (2004). *Discriminant Analysis and Statistical Pattern Recognition*. Wiley. https://onlinelibrary.wiley.com/doi/book/10.1002/0471725293
Ledoit, O., & Wolf, M. (2004). "A Well Conditioned Estimator for Large Dimensional Covariance Matrices." *Journal of Multivariate Analysis*, 88(2), 365-411. https://www.sciencedirect.com/science/article/pii/S0047259X03000964
scikit-learn developers. "Linear and Quadratic Discriminant Analysis." scikit-learn user guide. https://scikit-learn.org/stable/modules/lda_qda.html
Wikipedia contributors. "Linear discriminant analysis." https://en.wikipedia.org/wiki/Linear_discriminant_analysis

Linear Discriminant Analysis

History

Mathematical formulation

Two class case

Multi class case

Probabilistic interpretation

LDA vs QDA

LDA vs PCA

LDA vs logistic regression

Assumptions and limitations

Variants and extensions

Applications

Implementations

Worked example

Modern relevance

See also

References

Improve this article

History

Mathematical formulation

Two class case

Multi class case

Probabilistic interpretation

LDA vs QDA

LDA vs PCA

LDA vs logistic regression

Assumptions and limitations

Variants and extensions

Applications

Implementations

Worked example

Modern relevance

See also

References

History

Mathematical formulation

Two class case

Multi class case

Probabilistic interpretation

LDA vs QDA

LDA vs PCA

LDA vs logistic regression

Assumptions and limitations

Variants and extensions

Applications

Implementations

Worked example

Modern relevance

See also

References

Improve this article

Related Articles

ARC-AGI 2

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

Log-Odds

False Negative Rate

False Positive Rate (FPR)

History

Mathematical formulation

Two class case

Multi class case

Probabilistic interpretation

LDA vs QDA

LDA vs PCA

LDA vs logistic regression

Assumptions and limitations

Variants and extensions

Applications

Implementations

Worked example

Modern relevance

See also

References

Related Articles

ARC-AGI 2

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

Log-Odds

False Negative Rate

False Positive Rate (FPR)