Tabular models

Data Science Machine Learning

10 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

18 citations

Revision

v3 · 2,021 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Tabular models are machine learning systems that learn from data arranged in tables, where each row is a sample and each column is a feature. The defining empirical fact of the field is that, unlike images, text, and audio, tabular data is the main domain where deep neural networks are not the default: gradient-boosted decision trees such as XGBoost (2016), LightGBM (2017), and CatBoost (2018) remain state of the art on most tabular tasks, a result confirmed by Grinsztajn, Oyallon, and Varoquaux at NeurIPS 2022.^[11] The features are usually heterogeneous, mixing continuous numbers (age, income, a sensor reading), categorical codes (country, product identifier), ordinal levels, and binary flags. The columns of a table have no canonical spatial or temporal ordering, they differ in scale and meaning, and missing values are common. This setting covers a large share of applied machine learning, including credit scoring, fraud detection, churn prediction, demand forecasting, medical risk modeling, and click-through rate prediction.

This article is an overview of the model families used for tabular data and how they relate. The two main supervised tasks have their own detailed articles: tabular classification models predict a discrete class label, and tabular regression models predict a continuous numeric target. Most of the methods below apply to both tasks with only a change of loss function and output head.

What are tabular models?

Tabular models are predictors fit to structured data in rows and columns, where rows are observations and columns are named features of mixed type. They answer the most common form of applied prediction problem: given a record with several attributes, estimate a label or a number. Because the columns carry no fixed topology, a model cannot rely on the locality assumptions that make convolutional and sequence architectures effective. Useful interactions between columns tend to be low-order rather than deeply compositional, datasets are often small to medium (hundreds to a few million rows), class or target distributions are frequently skewed, and interpretability and calibration usually matter because the predictions drive consequential decisions. These properties favor models that are sample-efficient, robust to noise and uninformative features, fast to retrain, and easy to inspect. They are also the reason that decision-tree ensembles, rather than neural networks, have dominated the field for most of the last two decades.

Which model families are used for tabular data?

Linear and other classical models

The earliest tabular models were linear: ordinary least squares and ridge regression for numeric targets, and logistic regression for classification. These remain the workhorses when interpretability or well-calibrated probabilities are paramount. Penalized variants such as the lasso (L1) and elastic net (L1 plus L2) add variable selection. Other classical baselines include k-nearest neighbors, naive Bayes, support vector machines, and single decision trees (the CART framework of Breiman, Friedman, Olshen, and Stone, 1984).^[1] A single tree is high-variance, which is why ensembles of trees, rather than individual trees, became the standard.

Tree ensembles

Bagging (Breiman, 1996) trains many trees on bootstrap samples and averages them. Random forest (Breiman, 2001) adds feature subsampling at each split, giving a robust general-purpose model with few hyperparameters and a built-in out-of-bag error estimate.^[2] These methods reduce the variance of single trees while keeping their ability to handle mixed feature types, missing values, and feature interactions.

Gradient-boosted decision trees

Gradient boosting, formalized by Jerome Friedman (2001), builds an additive ensemble in which each new shallow tree is fit to the negative gradient of a differentiable loss with respect to the current predictions.^[3] Three open-source libraries account for most production use and most Kaggle wins on tabular data. XGBoost (Chen and Guestrin, KDD 2016) was the first widely adopted production-grade implementation, adding second-order Newton steps, an L1 and L2 regularized objective, and a sparsity-aware split finder for missing values.^[4] LightGBM (Ke et al., NeurIPS 2017) introduced gradient-based one-side sampling and exclusive feature bundling for speed and memory efficiency on large, sparse datasets, and grows trees leaf-wise.^[5] CatBoost (Prokhorenkova et al., NeurIPS 2018) introduced ordered target statistics to encode categorical features without target leakage, and uses symmetric oblivious trees for fast inference.^[6] Gradient-boosted trees remain the default baseline for tabular machine learning.

Deep learning approaches

Neural networks for tabular data try to recover the inductive biases that trees obtain almost for free, namely the ability to handle features at different scales and to ignore uninformative columns. A practical line of work in recommendation and online advertising, including Wide and Deep, DeepFM, and Deep and Cross Networks, models low-order feature crosses explicitly. For general tabular tasks, research accelerated after 2017. TabNet (Arik and Pfister, AAAI 2021) uses sequential attention to select features at each decision step.^[7] FT-Transformer (Gorishniy et al., NeurIPS 2021) tokenizes both numerical and categorical features and passes them through a standard transformer encoder; the same paper, Revisiting Deep Learning Models for Tabular Data, also showed that a well-tuned residual MLP is a strong baseline and that no neural model dominated XGBoost across all datasets.^[8] Other notable architectures include NODE (differentiable oblivious tree ensembles), TabTransformer, and SAINT. A recurring empirical finding is that careful tuning of an MLP recovers most of the gap to more elaborate deep learning designs.

Foundation models

A newer direction trains one large model on a vast variety of tabular tasks, then deploys it as a frozen in-context predictor that takes a labeled support set as input and outputs predictions without per-dataset training. TabPFN (Hollmann et al., ICLR 2023), a transformer trained on millions of synthetic tasks from a structural causal model prior, was the first widely cited example.^[9] The original version was deliberately narrow: it handled classification only, up to about 1,000 training points and 100 purely numerical features without missing values, and ran in less than a second.^[16] TabPFN v2 (Hollmann et al., Nature, January 2025) added native categorical and missing-value handling, extended the approach to regression, and works in a suggested regime of up to 10,000 samples, 500 features, and 10 classes, matching or surpassing tuned gradient-boosting ensembles on small to medium datasets.^[10]^[17] The scaling has continued: TabPFN-2.5 (November 2025) reaches roughly 50,000 rows and 2,000 features.^[18] Related efforts include TabDPT, trained on real OpenML datasets rather than synthetic data.

The table below compares the main families on the properties that matter for tabular work.

Family	Representative methods	Native categorical and missing handling	Typical strength	Main limitation
Linear and classical	Logistic regression, ridge, lasso, SVM, k-NN	No (needs encoding and imputation)	Interpretable, calibrated, fast	Only linear or low-flexibility boundaries
Random forests	Random forest, Extra Trees	Partial (handles mixed types, splits on codes)	Robust default, few hyperparameters	Weaker on smooth or linear signals
Gradient-boosted trees	XGBoost, LightGBM, CatBoost	Yes (LightGBM, CatBoost; XGBoost since 1.5)	State of the art on most tabular tasks	Needs tuning; less suited to very wide data
Deep learning	TabNet, FT-Transformer, NODE, SAINT, MLP	Yes (learned embeddings)	Flexible, integrates with other modalities	Rarely beats boosting at equal tuning budget
Foundation models	TabPFN, TabPFN v2, TabDPT	Yes (v2 onward)	Strong on small data, no per-dataset tuning	Limited row and feature counts so far

Why do gradient-boosted trees beat deep learning on tabular data?

Two influential 2022 studies crystallized the case that gradient-boosted trees still beat deep learning on typical tabular data when the comparison is fair. Grinsztajn, Oyallon, and Varoquaux (NeurIPS 2022) built a benchmark of 45 medium-sized datasets and, with equal tuning budgets, found that random forests and gradient boosting consistently outperformed FT-Transformer, ResNet, and MLPs on both classification and regression.^[11] Their abstract states that "tree-based models remain state-of-the-art on medium-sized data (~10K samples) even without accounting for their superior speed," and attributes the gap to three structural advantages of trees: they are robust to uninformative features, they preserve the orientation of the data (invariance to monotone transformations of individual features), and they can easily learn the irregular, non-smooth target functions common in tabular problems.^[11] Shwartz-Ziv and Armon (Information Fusion, 2022) reproduced several published deep models and found that they did not generalize beyond their authors' chosen datasets, while a tuned XGBoost ensemble won on most tasks; an ensemble of XGBoost with a neural model was often the strongest.^[12] The picture as of 2026 is roughly that tuned gradient boosting remains the default winner on medium to large tabular data, the small-data regime is increasingly contested by TabPFN-style foundation models, and ensembles of boosting with at least one neural component often win competitions.

What is TabPFN?

TabPFN (Tabular Prior-data Fitted Network) is a transformer that performs tabular prediction by in-context learning: it is pretrained once on millions of synthetic datasets drawn from a structural-causal-model prior, then takes a labeled training set and an unlabeled test set as a single input and outputs predictions in one forward pass, with no gradient updates or hyperparameter tuning per dataset.^[9] This amounts to an approximation of Bayesian inference over the space of datasets the prior can generate. The headline result of the Nature 2025 paper is one of the most-quoted statistics in the field: "In 2.8 s, TabPFN outperforms an ensemble of the strongest baselines tuned for 4 h in a classification setting."^[10] On datasets up to roughly 10,000 samples it matches or exceeds tuned XGBoost, LightGBM, and CatBoost while running orders of magnitude faster. Its main current limitation is scale: the suggested regime is bounded in rows, features, and classes, although successor releases (TabPFN-2.5 in late 2025) have steadily raised those ceilings.^[18] TabPFN is the clearest evidence so far that foundation-model methods can challenge gradient boosting on small tabular data, even though boosting still leads on larger tables.

References

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). *Classification and Regression Trees*. Wadsworth. ↩
Breiman, L. (2001). Random forests. *Machine Learning*, 45(1), 5-32. ↩
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. *Annals of Statistics*, 29(5), 1189-1232. ↩
Chen, T., and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. *KDD 2016*. https://arxiv.org/abs/1603.02754 (Accessed 2026-06-28) ↩
Ke, G., et al. (2017). LightGBM: A highly efficient gradient boosting decision tree. *NeurIPS 2017*. https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html (Accessed 2026-06-28) ↩
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. *NeurIPS 2018*. https://arxiv.org/abs/1706.09516 (Accessed 2026-06-28) ↩
Arik, S. O., and Pfister, T. (2021). TabNet: Attentive interpretable tabular learning. *AAAI 2021*. https://arxiv.org/abs/1908.07442 (Accessed 2026-06-28) ↩
Gorishniy, Y., Rubachev, I., Khrulkov, V., and Babenko, A. (2021). Revisiting deep learning models for tabular data. *NeurIPS 2021*. https://arxiv.org/abs/2106.11959 (Accessed 2026-06-28) ↩
Hollmann, N., Müller, S., Eggensperger, K., and Hutter, F. (2023). TabPFN: A transformer that solves small tabular classification problems in a second. *ICLR 2023*. https://arxiv.org/abs/2207.01848 (Accessed 2026-06-28) ↩
Hollmann, N., Müller, S., Purucker, L., Krishnakumar, A., Körfer, M., Hoo, S. B., Schirrmeister, R. T., and Hutter, F. (2025). Accurate predictions on small data with a tabular foundation model. *Nature*, 637, 319-326. https://www.nature.com/articles/s41586-024-08328-6 (Accessed 2026-06-28) ↩
Grinsztajn, L., Oyallon, E., and Varoquaux, G. (2022). Why do tree-based models still outperform deep learning on typical tabular data? *NeurIPS 2022 Datasets and Benchmarks*. https://arxiv.org/abs/2207.08815 (Accessed 2026-06-28) ↩
Shwartz-Ziv, R., and Armon, A. (2022). Tabular data: Deep learning is not all you need. *Information Fusion*, 81, 84-90. https://arxiv.org/abs/2106.03253 (Accessed 2026-06-28) ↩
XGBoost documentation. https://xgboost.readthedocs.io/ (Accessed 2026-06-28)
LightGBM documentation. https://lightgbm.readthedocs.io/ (Accessed 2026-06-28)
CatBoost documentation. https://catboost.ai/docs/ (Accessed 2026-06-28)
Hollmann, N., Müller, S., Eggensperger, K., and Hutter, F. (2023). TabPFN abstract: classification on up to 1,000 training points and 100 purely numerical features without missing values, in less than a second. https://arxiv.org/abs/2207.01848 (Accessed 2026-06-28) ↩
Ye, H.-J., et al. (2025). A closer look at TabPFN v2: suggested data regime of no more than 10,000 samples, 500 dimensions, and 10 classes. https://arxiv.org/abs/2502.17361 (Accessed 2026-06-28) ↩
Prior Labs. TabPFN-2.5 (released 2025-11-06), scaling to roughly 50,000 rows and 2,000 features. https://huggingface.co/Prior-Labs (Accessed 2026-06-28) ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

AI Wiki Tabular Classification Models Tabular Regression Models

What are tabular models?

Which model families are used for tabular data?

Linear and other classical models

Tree ensembles

Gradient-boosted decision trees

Deep learning approaches

Foundation models

Why do gradient-boosted trees beat deep learning on tabular data?

What is TabPFN?

See also

References

Improve this article

Related Articles

A/B Testing

Anomaly Detection

Confirmation Bias

DataFrame

Data Analysis

Feature Set

What links here

Related Articles

A/B Testing

Anomaly Detection

Confirmation Bias

DataFrame

Data Analysis

Feature Set

What links here