AutoML (Automated Machine Learning)

Developer Tools MLOps Model Architecture Training & Optimization

23 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

27 citations

Revision

v3 · 4,587 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

AutoML (Automated Machine Learning) is the automation of the end-to-end pipeline of applying machine learning to real-world data, replacing manual trial and error with a formal search over data preprocessing, feature engineering, model selection, hyperparameter optimisation, model evaluation, and ensembling. A typical AutoML system takes a raw dataset and a time budget and returns a trained, validated model with little or no human intervention. The goal, as the field's standard reference puts it, is "effective machine learning out of the box," so that domain scientists, business analysts and software engineers can apply ML without a dedicated specialist.^[1]

The term has been in use since at least the early 2010s, when researchers began publishing systems such as Auto-WEKA (2013) and tools for Bayesian optimisation of machine learning algorithms (Snoek, Larochelle and Adams 2012).^[16] The field crystallised with the publication of the auto-sklearn paper by Feurer et al. at NeurIPS 2015^[2] and the open-access edited volume by Hutter, Kotthoff and Vanschoren in 2019, the first comprehensive book devoted to the subject.^[1] Commercial AutoML products entered the market in 2018, when Google launched Cloud AutoML Vision, and the field has since become a standard component of major cloud machine-learning platforms.^[24] The most recent shift is the arrival of tabular foundation models such as TabPFN (Nature, 2025), which can produce strong predictions on small tables in a single forward pass and challenge the search-based paradigm AutoML was built on.^[25]

What problems does AutoML solve?

Applying machine learning to a new dataset is more craft than science. The practitioner must clean and encode data, choose a model family that suits the problem, set dozens of hyperparameters, decide on a validation scheme, and iterate. Expertise of this kind is scarce, expensive, and slow to acquire. Hutter, Kotthoff and Vanschoren framed the motivation in their 2019 book as a desire for "effective machine learning out of the box," so that domain scientists, business analysts and software engineers can apply ML without a dedicated specialist.^[1]

There are also methodological reasons. Manual hyperparameter tuning is error-prone and biased toward configurations the practitioner has used before. Bergstra and Bengio showed in 2012 that grid search wastes most of its budget on irrelevant axes, and that random search is strictly more efficient under realistic assumptions.^[15] Even random search is not informed by prior runs, which leaves performance on the table when many similar datasets have been studied before. AutoML treats model selection and tuning as a formal optimisation problem.

A further motivation is reproducibility. A pipeline produced by an AutoML system is fully specified by the search space, the optimiser, the random seed and the dataset, so two researchers running the same configuration get the same model.

Components of an AutoML pipeline

A modern AutoML system typically combines several stages. The table below summarises the main components, drawing on the taxonomy in He, Zhao and Chu's 2021 survey and Chapter 1 of the Hutter, Kotthoff and Vanschoren book.^[14]^[1]

Stage	Typical operations	Example systems
Data preprocessing	Missing-value imputation, type inference, categorical encoding, scaling, outlier handling	auto-sklearn, AutoGluon
Feature engineering	Automated feature construction, polynomial and interaction terms, target encoding, feature selection	TPOT, FeatureTools (Deep Feature Synthesis)
Model selection	Choose family from trees, linear models, kernel methods, neural networks, gradient boosting	All AutoML systems
Hyperparameter optimisation	Tune per-model hyperparameters such as learning rate, depth, regularisation strength	SMAC, BOHB, Optuna
Architecture search	For deep networks, search over layer types, connectivity, width and depth	NASNet, DARTS, Auto-Keras
Ensembling and stacking	Combine top-k candidate models with weighted averaging, stacking or bagging	auto-sklearn, AutoGluon, H2O AutoML
Calibration and post-processing	Probability calibration, threshold tuning, fairness post-processing	H2O AutoML, FLAML

Not every system covers every stage. Auto-sklearn and TPOT focus on the tabular pipeline; Auto-Keras and DARTS focus on neural architecture search; AutoGluon spans tabular, text, image and multimodal data with a unified API.

How does AutoML tune hyperparameters?

The core algorithmic problem inside AutoML is hyperparameter tuning. Many algorithms have been proposed, and most production systems use a combination of them. The table below lists the most influential.

Method	Reference	Idea
Grid search	Long-standing baseline	Exhaustive enumeration of a discretised hyperparameter grid
Random search	Bergstra and Bengio 2012 (JMLR)	Sample configurations independently from a prior; provably better than grid for low-effective-dimensionality problems
Bayesian optimisation with Gaussian processes	Snoek, Larochelle and Adams 2012 (NeurIPS), Spearmint package	Fit a GP surrogate to past evaluations, choose next point via acquisition function such as Expected Improvement
Tree-structured Parzen Estimator (TPE)	Bergstra, Bardenet, Bengio and Kegl 2011 (NeurIPS)	Use density estimators on "good" and "bad" configurations rather than a single regression surrogate
SMAC	Hutter, Hoos and Leyton-Brown 2011 (LION)	Random-forest surrogate suited to mixed continuous, discrete and conditional spaces; used by auto-sklearn
Successive halving	Karnin, Koren and Somekh 2013	Allocate small budgets to many configurations, prune the worst, double the budget for survivors
Hyperband	Li, Jamieson, DeSalvo, Rostamizadeh and Talwalkar 2017	Bandit-based wrapper around successive halving with multiple bracket sizes
BOHB	Falkner, Klein and Hutter 2018 (ICML)	Combines TPE-style Bayesian optimisation with Hyperband multi-fidelity scheduling
Population-based training	Jaderberg et al. 2017 (DeepMind)	Train a population of models in parallel, periodically copy weights of the best workers and perturb their hyperparameters
CFO and BlendSearch	Wang, Wu, Weimer and Zhu 2021 (FLAML, MLSys)	Cost-aware search that trades off evaluation cost against improvement

The shift from grid search through random search to Bayesian and bandit-based approaches reflects a steady increase in sample efficiency. Multi-fidelity methods such as Hyperband and BOHB are the dominant choice when each evaluation is expensive, for example when fitting a deep network on a large dataset.^[19]^[20]

What is neural architecture search?

Neural architecture search (NAS) is the sub-area of AutoML concerned with discovering the topology of deep networks. The modern wave began with Zoph and Le's 2017 ICLR paper "Neural Architecture Search with Reinforcement Learning," in which a recurrent controller proposes architectures encoded as variable-length strings, the proposed network is trained on CIFAR-10, and the validation accuracy is fed back to the controller as a reinforcement-learning reward. The original system used about 800 GPUs for several weeks but produced architectures competitive with the best human designs of the time.^[6]

Follow-up work attacked the cost. NASNet (Zoph, Vasudevan, Shlens and Le 2018) restricted the search to a small "cell" that is then stacked, allowing search on CIFAR-10 to transfer to ImageNet. ENAS (Pham, Guan, Zoph, Le and Dean 2018, ICML) introduced parameter sharing between candidate architectures, which the authors reported as roughly 1000x cheaper than the original NAS.^[8] DARTS (Liu, Simonyan and Yang 2019, ICLR) replaced the discrete search with a continuous relaxation, allowing gradient descent to optimise architecture weights directly. The DARTS paper reported competitive results on CIFAR-10 and Penn Treebank in the order of one GPU-day rather than thousands.^[7]

Latency-aware search came next. MnasNet (Tan, Chen, Pang, Vasudevan, Sandler, Howard and Le, CVPR 2019) added measured on-device latency to the reward, so that the search produced architectures suitable for mobile inference.^[9] EfficientNet (Tan and Le, ICML 2019) used a NAS-derived baseline (EfficientNet-B0) and a principled compound-scaling rule for width, depth and resolution; the resulting family of models reached state-of-the-art ImageNet accuracy with a fraction of the parameters of earlier networks.^[10]

The table below summarises the main NAS approaches.

Method	Year	Search strategy	Notes
NAS with RL	Zoph and Le 2017	RNN controller trained by REINFORCE	First modern NAS; thousands of GPU-days
NASNet	Zoph et al. 2018	RL on transferable cells	CIFAR-10 search transfers to ImageNet
ENAS	Pham et al. 2018	RL with weight sharing	About 1000x cheaper than NAS
DARTS	Liu, Simonyan and Yang 2019	Differentiable, gradient-based	Continuous relaxation of architecture
MnasNet	Tan et al. 2019	RL with latency objective	Targets mobile inference
EfficientNet	Tan and Le 2019	NAS baseline plus compound scaling	State-of-the-art accuracy / parameter trade-off
Auto-Keras	Jin, Song and Hu 2019	Bayesian optimisation over network morphisms	Integrates with Keras API

Major AutoML systems

Research systems and commercial products have proliferated since around 2015. The table below compares the most widely used.

System	First release	Maintainer	Approach	Modalities
Auto-WEKA	2013	University of British Columbia	Bayesian optimisation (SMAC) over WEKA classifiers	Tabular
auto-sklearn	2015	University of Freiburg	Meta-learning warm-start, SMAC, ensemble selection	Tabular
TPOT	2016	University of Pennsylvania	Genetic programming over scikit-learn pipelines	Tabular
H2O AutoML	2017	H2O.ai	Random search and grid search plus stacked ensembles	Tabular
Google Cloud AutoML	2018	Google Cloud	Transfer learning and NAS, later folded into Vertex AI	Vision, language, tables, video
Auto-Keras	2018 (paper 2019)	Texas A&M	Bayesian optimisation over network morphisms	Vision, text
Azure Automated ML	2018	Microsoft	Probabilistic matrix factorisation for warm-start, plus SMAC and ensembles	Tabular, vision, NLP
Amazon SageMaker Autopilot	2019	AWS	White-box AutoML producing notebooks	Tabular
AutoGluon	2020	Amazon	Multi-layer stacking, no model search per se	Tabular, text, image, multimodal, time series
FLAML	2021	Microsoft	Cost-aware search (CFO, BlendSearch)	Tabular
LightAutoML	2021	Sber AI Lab	Modular pipeline tuned for financial data	Tabular
DataRobot	2014	DataRobot Inc.	Commercial closed-source platform	Tabular, text, time series
MLJAR, PyCaret	2019	Independent	Open-source wrappers around scikit-learn and gradient boosting	Tabular

A few of these warrant a closer look.

Auto-sklearn (Feurer, Klein, Eggensperger, Springenberg, Blum and Hutter, NeurIPS 2015) defines a structured search space of 15 classifiers, 14 feature preprocessing methods and 4 data preprocessing methods, giving 110 hyperparameters in total. The system uses two ideas not present in earlier work. First, it warm-starts SMAC by retrieving promising configurations from datasets that are similar in meta-features to the current one, drawing on a database of past runs. Second, it builds an ensemble out of all configurations evaluated during search, rather than returning only the single best, which markedly reduces variance. Auto-sklearn won the first phase of the ChaLearn AutoML challenge.^[2] A redesigned successor, Auto-sklearn 2.0 (Feurer, Eggensperger, Falkner, Lindauer and Hutter, JMLR 2022), dropped hand-crafted meta-features in favour of a pre-computed portfolio of complementary pipelines plus a bandit-based budget allocation, which the authors report improves performance under tight time limits.^[26]

TPOT (Olson and Moore, ICML 2016 AutoML workshop) treats a pipeline as an expression tree and uses genetic programming to evolve it. Operators include preprocessing transformations and scikit-learn estimators. TPOT exposes the discovered pipeline as Python code, which makes it easy to inspect and edit. On 150 supervised classification tasks the original benchmark reported significant improvements over a default scikit-learn baseline on 22 of them.^[3]

H2O AutoML (LeDell and Poirier, ICML 2020 AutoML workshop) takes a more pragmatic line. It trains a fixed grid of GLMs, random forests, gradient-boosting machines (including XGBoost), and deep neural networks within a user-specified time budget, then builds two stacked ensembles, one over all models and one restricted to the best of each family. The H2O AutoML algorithm was first released in H2O 3.12.0.1 in June 2017. It has APIs in R, Python, Java and Scala.^[5]

AutoGluon (Erickson, Mueller, Shirkov, Zhang, Larroy, Li and Smola 2020) made an explicit choice not to search hyperparameters or architectures. Instead, it trains a fixed set of strong models with sensible defaults and combines them via multi-layer stacking. The 2020 paper reported that AutoGluon-Tabular "beats 99% of the participating data scientists" after four hours on raw data, placing it 42nd of 3,505 teams on the Otto Group competition and 39th of 2,920 teams on the BNP Paribas competition, and outperformed auto-sklearn, H2O and TPOT on the OpenML AutoML Benchmark.^[12]

FLAML (Wang, Wu, Weimer and Zhu, MLSys 2021) emphasises cost. Its CFO and BlendSearch algorithms model the cost of an evaluation as a function of hyperparameters and choose configurations that maximise expected improvement per unit cost. The library is intentionally lightweight: it ships with a few hundred lines of core search code rather than a heavyweight framework.^[11]

Google Cloud AutoML launched on 17 January 2018 with Cloud AutoML Vision, a no-code service that fine-tunes a pretrained image classifier on user-uploaded data using transfer learning and architecture search. Google later added Natural Language, Translation, Tables and Video, and folded all of them into Vertex AI in 2021.^[24]

Tabular foundation models: the TabPFN shift

The newest development in AutoML is the tabular foundation model, which sidesteps per-dataset search entirely. TabPFN ("Tabular Prior-data Fitted Network"), introduced by Hollmann, Mueller, Purucker and colleagues at the University of Freiburg and later commercialised by Prior Labs (founded 2024), is a transformer pre-trained once on millions of synthetic tabular datasets. At inference it performs in-context learning: the training rows are passed in as context and predictions for new rows come out of a single forward pass, with no gradient descent or hyperparameter tuning on the target dataset.^[25]

The 2025 Nature paper, "Accurate predictions on small data with a tabular foundation model," reports that TabPFN v2 "outperforms all previous methods on datasets with up to 10,000 samples by a wide margin, using substantially less training time."^[25] The headline efficiency claim is striking: "In 2.8 s, TabPFN outperforms an ensemble of the strongest baselines tuned for 4 h in a classification setting."^[25] The model targets datasets with up to roughly 10,000 samples and 500 features, and reports speedups of about 5,140x for classification and 3,000x for regression relative to tuned gradient-boosted decision trees.^[25] The paper appeared in Nature volume 637, issue 8045, pages 319-326.^[25] A later release, TabPFN v2.5, extends the practical envelope to datasets with up to 50,000 rows and 2,000 features.^[27]

TabPFN does not replace AutoML so much as reframe it. For small clean tables it offers a strong, near-instant default that needs no search, which is exactly the regime where classical AutoML systems spent most of their compute. For larger datasets, mixed modalities, and time-series or multimodal problems, search-based systems such as AutoGluon and H2O AutoML remain the standard, and several AutoML frameworks now include TabPFN as one more base learner in their model portfolio.

Key concepts

A handful of ideas appear repeatedly across AutoML systems.

Meta-learning and warm-starting. When a new dataset arrives, an AutoML system can use prior knowledge from previous runs on similar datasets to bias the initial configurations of the optimiser. Auto-sklearn does this by computing meta-features (number of samples, number of classes, skewness, kurtosis and so on) and retrieving the 25 most similar OpenML datasets, then seeding SMAC with their best-known configurations.^[2] Vanschoren's chapter in the 2019 book gives a thorough treatment.^[1]

Multi-fidelity optimisation. Evaluating a configuration on the full dataset for the full number of epochs is expensive. Multi-fidelity methods cheat by evaluating on a subset of the data, for fewer epochs, or with a smaller model, and use these cheap proxies to filter unpromising configurations before committing real budget. Hyperband and BOHB are the canonical examples.^[19]^[20]

Surrogate models. A surrogate is a cheap-to-evaluate proxy for the expensive objective (validation accuracy after full training). Gaussian processes (Spearmint), random forests (SMAC), TPE density estimators, and neural-network surrogates have all been used. The acquisition function on top of the surrogate decides where to evaluate next.

Bandits and successive halving. Successive halving treats configurations as arms of a multi-armed bandit. Run all of them for a small budget, keep the top half, double the budget, repeat. Hyperband runs successive halving with several bracket sizes to hedge against picking a wrong starting budget.^[19]

Time and compute budgeting. Real users care about wall-clock time. AutoML systems usually expose a budget parameter and try to make the best use of it. FLAML's headline feature is that it can do useful work in seconds rather than hours.^[11]

What are the strengths of AutoML?

For tabular data with reasonably clean inputs, modern AutoML systems frequently match or beat hand-tuned pipelines built by experienced data scientists, especially within fixed time budgets. The OpenML AutoML Benchmark by Gijsbers, LeDell, Thomas, Poirier, Bischl and Vanschoren (2019, updated as a JMLR paper in 2024) shows tight competition between auto-sklearn, H2O AutoML, TPOT, AutoGluon and FLAML on tabular tasks, with AutoGluon and H2O often near the top.^[22]

The key advantages are a lower barrier to entry, since a non-expert can produce a reasonable model with a few lines of code; strong baselines that set a high bar for any hand-crafted alternative; reproducibility, because pipelines and seeds are recorded; and broader coverage of the search space, since optimisers explore configurations a human would not think to try.

What are the limitations of AutoML?

AutoML is not a silver bullet, and several limitations are well documented.

It is compute-intensive. A long auto-sklearn run can consume tens of CPU-hours; a NAS run can consume thousands of GPU-hours. The original Zoph and Le NAS paper used the equivalent of around 22,400 GPU-hours.^[6] Even after the cost reductions delivered by ENAS and DARTS, neural architecture search remains more expensive than training a single off-the-shelf network.^[8]^[7]

The resulting pipeline is mostly opaque. A stacked ensemble of 30 gradient-boosting machines and random forests is not easy to explain to a regulator or domain expert. AutoGluon and similar systems trade interpretability for accuracy, which is acceptable in many applications and unacceptable in others.

Results can be brittle on novel domains. AutoML systems are tuned on benchmark distributions and may underperform on data whose structure was not anticipated, such as sparse high-cardinality categorical data, time series with irregular sampling, or scientific data with strong physical constraints.

Long searches risk overfitting to the validation set. If 10,000 configurations are evaluated against the same validation split, the best one is partly selected for noise. Cross-validation and ensemble averaging mitigate this but do not eliminate the multiple-comparisons problem. On high-stakes problems where human expertise is plentiful, manual tuning by a skilled team often still wins. AutoML is most valuable where expertise is the bottleneck.

How is AutoML benchmarked?

The OpenML AutoML Benchmark, introduced by Gijsbers et al. at the ICML 2019 AutoML workshop, is the standard evaluation suite for tabular AutoML. It defines a curated list of binary and multiclass classification tasks (and later regression tasks) drawn from OpenML, fixes time budgets (typically one hour and four hours), and runs each system in a Docker container with controlled resources. The 2024 JMLR update by Gijsbers et al. extended the benchmark to 71 tasks and 11 frameworks. Results are publicly available and updated regularly.^[22]

For neural architecture search the canonical benchmarks are CIFAR-10 and ImageNet image classification, with NAS-discovered architectures (NASNet, MnasNet, EfficientNet) achieving state-of-the-art accuracy in their respective compute classes.^[9]^[10] NAS-Bench-101 (Ying et al. 2019) and NAS-Bench-201 (Dong and Yang 2020) provide tabulated training results for many architectures, allowing fast and reproducible comparisons of NAS algorithms without re-running expensive training.

Real-world applications

AutoML has been deployed in many industries. Documented use cases include drug discovery and molecular property prediction, genomics and clinical risk prediction, time-series forecasting in retail and supply chain (AutoGluon-TimeSeries and similar tools produce probabilistic forecasts at scale), recommendation systems with cold-start models on new product categories, manufacturing quality control using vision AutoML services to detect defects, and standard business analytics such as sales forecasting, customer churn modelling and marketing-mix attribution. At Sber, LightAutoML was used in production for credit-risk and customer-analytics tasks.^[13]

Open-source ecosystem

A practitioner today can build an AutoML stack entirely from open-source components. Common building blocks include auto-sklearn for tabular classification and regression with meta-learning warm-start; TPOT for genetic programming over pipelines; AutoGluon for tabular, multimodal and time-series tasks; FLAML for fast and lightweight HPO; Optuna (Akiba, Sano, Yanase, Ohta and Koyama, KDD 2019), a define-by-run hyperparameter optimisation framework used as a backend by many other systems;^[23] Ray Tune for distributed hyperparameter search with Hyperband, BOHB and PBT; Hyperopt, the original TPE implementation by James Bergstra; SMAC3 (Lindauer et al. 2022), the modern Python SMAC implementation; the TabPFN package from Prior Labs for tabular foundation-model inference; and NNI (Microsoft Neural Network Intelligence), a toolkit covering HPO, NAS, model compression and feature engineering.

Commercial offerings

Alongside the open-source projects, every major cloud has its own AutoML product. The table below summarises them.

Vendor	Product	Modality	Launched	Notes
Google Cloud	Cloud AutoML, then Vertex AI AutoML	Vision, NLP, Tables, Video	2018	First major no-code cloud AutoML
Microsoft Azure	Azure Automated ML	Tabular, vision, NLP	2018	Integrated with Azure ML Studio
AWS	SageMaker Autopilot	Tabular	2019	White-box approach: returns generated notebooks
AWS	Amazon SageMaker Canvas	Tabular	2021	Business-analyst-oriented no-code interface
DataRobot	DataRobot AI Cloud	Tabular, text, time series	2014	Pioneering enterprise AutoML vendor
H2O.ai	Driverless AI	Tabular, time series, NLP	2017	Commercial counterpart to H2O AutoML
dotData	dotData Enterprise	Tabular with auto-feature-engineering	2018	Spin-off from NEC research
Prior Labs	TabPFN (API and enterprise)	Tabular	2024	Tabular foundation model, in-context prediction

Connection to broader AI

AutoML overlaps with MLOps. MLOps is concerned with the full lifecycle of model deployment, monitoring and retraining; AutoML focuses on the model-building stage. Many MLOps platforms include an AutoML component to bootstrap models that are then handed to deployment pipelines.

The rise of foundation models has shifted the centre of gravity. Where AutoML once asked which small model is best for this dataset, the question for foundation-model users is closer to which adapter, learning rate and data mixture should I use to fine-tune this pretrained model. Research on AutoML for foundation models, including AutoLoRA (Zhang et al. 2023) and automated parameter-efficient fine-tuning, is active, and tools such as FLAML and Optuna are increasingly used to tune LoRA ranks, prompt templates and retrieval parameters. NAS has continued to evolve in parallel, with active work on transformer-based language models, graph neural networks (Graph NAS, Gao et al. 2020) and multimodal architectures.

Future directions

Several open problems are likely to shape AutoML in the coming years. Automated data engineering and labelling is one: many real-world failures are caused by data issues rather than modelling issues, and tools for automated data quality checks, labelling and weak supervision are an active research area. AutoML for foundation models is another, with the combinatorial space of base model, fine-tuning regime, retrieval component and prompt much larger than classical model selection. Tabular foundation models such as TabPFN raise the question of how much per-dataset search is still needed when a pre-trained model can predict in one forward pass. AutoML for graph neural networks and other non-tabular structures is less mature than its tabular and image counterparts. Multimodal AutoML, building on systems like AutoGluon-Multimodal, has room for better fusion architectures and automated cross-modal preprocessing. Compute-efficient NAS, using once-for-all networks, supernet training and zero-cost proxies, aims to bring architecture search closer to the cost of a single training run. Finally, better evaluation methodology, beyond OpenML AutoML Benchmark and NAS-Bench-x, is needed because real-world deployment performance is still hard to predict from benchmark scores.

References

Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren (eds.). *Automated Machine Learning: Methods, Systems, Challenges*. Springer Series on Challenges in Machine Learning, 2019. Open access: https://www.automl.org/book/ ↩
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, Manuel Blum, Frank Hutter. "Efficient and Robust Automated Machine Learning." In *Advances in Neural Information Processing Systems 28* (NeurIPS 2015). https://papers.neurips.cc/paper/2015/hash/11d0e6287202fced83f79975ec59a3a6-Abstract.html ↩
Randal S. Olson and Jason H. Moore. "TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning." In *ICML 2016 AutoML Workshop*. http://proceedings.mlr.press/v64/olson_tpot_2016.html ↩
Haifeng Jin, Qingquan Song, Xia Hu. "Auto-Keras: An Efficient Neural Architecture Search System." In *Proceedings of KDD 2019*. arXiv:1806.10282. https://arxiv.org/abs/1806.10282
Erin LeDell and Sebastien Poirier. "H2O AutoML: Scalable Automatic Machine Learning." *7th ICML Workshop on Automated Machine Learning*, 2020. https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf ↩
Barret Zoph and Quoc V. Le. "Neural Architecture Search with Reinforcement Learning." *ICLR 2017*. arXiv:1611.01578. https://arxiv.org/abs/1611.01578 ↩
Hanxiao Liu, Karen Simonyan, Yiming Yang. "DARTS: Differentiable Architecture Search." *ICLR 2019*. arXiv:1806.09055. https://arxiv.org/abs/1806.09055 ↩
Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean. "Efficient Neural Architecture Search via Parameter Sharing." *ICML 2018*. https://proceedings.mlr.press/v80/pham18a.html ↩
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le. "MnasNet: Platform-Aware Neural Architecture Search for Mobile." *CVPR 2019*. arXiv:1807.11626. ↩
Mingxing Tan and Quoc V. Le. "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks." *ICML 2019*. https://proceedings.mlr.press/v97/tan19a/tan19a.pdf ↩
Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu. "FLAML: A Fast and Lightweight AutoML Library." *MLSys 2021*. arXiv:1911.04706. https://www.microsoft.com/en-us/research/wp-content/uploads/2021/03/MLSys21FLAML.pdf ↩
Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, Alexander Smola. "AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data." arXiv:2003.06505, 2020. https://arxiv.org/abs/2003.06505 ↩
Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin. "LightAutoML: AutoML Solution for a Large Financial Services Ecosystem." arXiv:2109.01528, 2021. ↩
Xin He, Kaiyong Zhao, Xiaowen Chu. "AutoML: A Survey of the State-of-the-Art." *Knowledge-Based Systems*, vol. 212, 2021. arXiv:1908.00709. ↩
James Bergstra and Yoshua Bengio. "Random Search for Hyper-Parameter Optimization." *Journal of Machine Learning Research* 13, pp. 281-305, 2012. https://jmlr.org/papers/v13/bergstra12a.html ↩
Jasper Snoek, Hugo Larochelle, Ryan P. Adams. "Practical Bayesian Optimization of Machine Learning Algorithms." *NeurIPS 2012*. arXiv:1206.2944. ↩
James Bergstra, Remi Bardenet, Yoshua Bengio, Balazs Kegl. "Algorithms for Hyper-Parameter Optimization." *NeurIPS 2011*.
Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown. "Sequential Model-Based Optimization for General Algorithm Configuration." *LION 2011*.
Lisha Li, Kevin Jamieson, Giulio DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization." *JMLR* 18, 2018. arXiv:1603.06560. ↩
Stefan Falkner, Aaron Klein, Frank Hutter. "BOHB: Robust and Efficient Hyperparameter Optimization at Scale." *ICML 2018*. ↩
Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu. "Population Based Training of Neural Networks." arXiv:1711.09846, 2017.
Pieter Gijsbers, Erin LeDell, Janek Thomas, Sebastien Poirier, Bernd Bischl, Joaquin Vanschoren. "An Open Source AutoML Benchmark." *ICML 2019 AutoML Workshop*. arXiv:1907.00909. Updated journal version: *JMLR* 25, 2024. ↩
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama. "Optuna: A Next-generation Hyperparameter Optimization Framework." *KDD 2019*. arXiv:1907.10902. ↩
Google Cloud. "Cloud AutoML: Making AI accessible to every business." Announcement, 17 January 2018. ↩
Noah Hollmann, Samuel Mueller, Lennart Purucker, Arjun Krishnakumar, Max Koerfer, Shi Bin Hoo, Robin Tibor Schirrmeister, Frank Hutter. "Accurate predictions on small data with a tabular foundation model." *Nature* 637, no. 8045, pp. 319-326, 2025. https://www.nature.com/articles/s41586-024-08328-6 ↩
Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, Frank Hutter. "Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning." *Journal of Machine Learning Research* 23, 2022. arXiv:2007.04074. https://jmlr.org/papers/v23/21-0992.html ↩
Prior Labs. "TabPFN-2.5 model card." Hugging Face, 2025. https://huggingface.co/Prior-Labs/tabpfn_2_5 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Abacus.AI Abbreviations Barret Zoph Bayesian Optimization Deep Learning Google Vertex AI Hyperparameter Tuning Keras Kubeflow Lion (optimizer)Machine learning terms/Google Cloud Meta-Learning Neural architecture search Quoc V. Le Recursive self-improvement

What problems does AutoML solve?

Components of an AutoML pipeline

How does AutoML tune hyperparameters?

What is neural architecture search?

Major AutoML systems

Tabular foundation models: the TabPFN shift

Key concepts

What are the strengths of AutoML?

What are the limitations of AutoML?

How is AutoML benchmarked?

Real-world applications

Open-source ecosystem

Commercial offerings

Connection to broader AI

Future directions

References

Improve this article

Related Articles

Parameter Server (PS)

Partitioning strategy

Pipelining

Distributed training

Operation (op)

SavedModel

What links here

Related Articles

Parameter Server (PS)

Partitioning strategy

Pipelining

Distributed training

Operation (op)

SavedModel

What links here