# AutoML (Automated Machine Learning)

> Source: https://aiwiki.ai/wiki/automl
> Updated: 2026-06-23
> Categories: Developer Tools, MLOps, Model Architecture, Training & Optimization
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**AutoML (Automated Machine Learning)** is the automation of the end-to-end pipeline of applying machine learning to real-world data, replacing manual trial and error with a formal search over data preprocessing, [feature engineering](/wiki/feature_engineering), model selection, [hyperparameter](/wiki/hyperparameter) optimisation, model evaluation, and ensembling. A typical AutoML system takes a raw dataset and a time budget and returns a trained, validated model with little or no human intervention. The goal, as the field's standard reference puts it, is "effective machine learning out of the box," so that domain scientists, business analysts and software engineers can apply ML without a dedicated specialist.[1]

The term has been in use since at least the early 2010s, when researchers began publishing systems such as Auto-WEKA (2013) and tools for Bayesian optimisation of machine learning algorithms (Snoek, Larochelle and Adams 2012).[16] The field crystallised with the publication of the auto-sklearn paper by Feurer et al. at NeurIPS 2015[2] and the open-access edited volume by Hutter, Kotthoff and Vanschoren in 2019, the first comprehensive book devoted to the subject.[1] Commercial AutoML products entered the market in 2018, when Google launched Cloud AutoML Vision, and the field has since become a standard component of major cloud machine-learning platforms.[24] The most recent shift is the arrival of tabular foundation models such as TabPFN (Nature, 2025), which can produce strong predictions on small tables in a single forward pass and challenge the search-based paradigm AutoML was built on.[25]

## What problems does AutoML solve?

Applying machine learning to a new dataset is more craft than science. The practitioner must clean and encode data, choose a model family that suits the problem, set dozens of hyperparameters, decide on a validation scheme, and iterate. Expertise of this kind is scarce, expensive, and slow to acquire. Hutter, Kotthoff and Vanschoren framed the motivation in their 2019 book as a desire for "effective machine learning out of the box," so that domain scientists, business analysts and software engineers can apply ML without a dedicated specialist.[1]

There are also methodological reasons. Manual hyperparameter tuning is error-prone and biased toward configurations the practitioner has used before. Bergstra and Bengio showed in 2012 that grid search wastes most of its budget on irrelevant axes, and that random search is strictly more efficient under realistic assumptions.[15] Even random search is not informed by prior runs, which leaves performance on the table when many similar datasets have been studied before. AutoML treats model selection and tuning as a formal optimisation problem.

A further motivation is reproducibility. A pipeline produced by an AutoML system is fully specified by the search space, the optimiser, the random seed and the dataset, so two researchers running the same configuration get the same model.

## Components of an AutoML pipeline

A modern AutoML system typically combines several stages. The table below summarises the main components, drawing on the taxonomy in He, Zhao and Chu's 2021 survey and Chapter 1 of the Hutter, Kotthoff and Vanschoren book.[14][1]

| Stage | Typical operations | Example systems |
|---|---|---|
| Data preprocessing | Missing-value imputation, type inference, categorical encoding, scaling, outlier handling | auto-sklearn, AutoGluon |
| Feature engineering | Automated feature construction, polynomial and interaction terms, target encoding, feature selection | TPOT, FeatureTools (Deep Feature Synthesis) |
| Model selection | Choose family from trees, linear models, kernel methods, neural networks, gradient boosting | All AutoML systems |
| Hyperparameter optimisation | Tune per-model hyperparameters such as learning rate, depth, regularisation strength | SMAC, BOHB, Optuna |
| Architecture search | For deep networks, search over layer types, connectivity, width and depth | NASNet, DARTS, Auto-Keras |
| Ensembling and stacking | Combine top-k candidate models with weighted averaging, stacking or bagging | auto-sklearn, AutoGluon, H2O AutoML |
| Calibration and post-processing | Probability calibration, threshold tuning, fairness post-processing | H2O AutoML, FLAML |

Not every system covers every stage. Auto-sklearn and TPOT focus on the tabular pipeline; Auto-Keras and DARTS focus on neural architecture search; AutoGluon spans tabular, text, image and multimodal data with a unified API.

## How does AutoML tune hyperparameters?

The core algorithmic problem inside AutoML is [hyperparameter tuning](/wiki/hyperparameter_tuning). Many algorithms have been proposed, and most production systems use a combination of them. The table below lists the most influential.

| Method | Reference | Idea |
|---|---|---|
| Grid search | Long-standing baseline | Exhaustive enumeration of a discretised hyperparameter grid |
| Random search | Bergstra and Bengio 2012 (JMLR) | Sample configurations independently from a prior; provably better than grid for low-effective-dimensionality problems |
| Bayesian optimisation with Gaussian processes | Snoek, Larochelle and Adams 2012 (NeurIPS), Spearmint package | Fit a GP surrogate to past evaluations, choose next point via acquisition function such as Expected Improvement |
| Tree-structured Parzen Estimator (TPE) | Bergstra, Bardenet, Bengio and Kegl 2011 (NeurIPS) | Use density estimators on "good" and "bad" configurations rather than a single regression surrogate |
| SMAC | Hutter, Hoos and Leyton-Brown 2011 (LION) | Random-forest surrogate suited to mixed continuous, discrete and conditional spaces; used by auto-sklearn |
| Successive halving | Karnin, Koren and Somekh 2013 | Allocate small budgets to many configurations, prune the worst, double the budget for survivors |
| Hyperband | Li, Jamieson, DeSalvo, Rostamizadeh and Talwalkar 2017 | Bandit-based wrapper around successive halving with multiple bracket sizes |
| BOHB | Falkner, Klein and Hutter 2018 (ICML) | Combines TPE-style Bayesian optimisation with Hyperband multi-fidelity scheduling |
| Population-based training | Jaderberg et al. 2017 (DeepMind) | Train a population of models in parallel, periodically copy weights of the best workers and perturb their hyperparameters |
| CFO and BlendSearch | Wang, Wu, Weimer and Zhu 2021 (FLAML, MLSys) | Cost-aware search that trades off evaluation cost against improvement |

The shift from grid search through random search to Bayesian and bandit-based approaches reflects a steady increase in sample efficiency. Multi-fidelity methods such as Hyperband and BOHB are the dominant choice when each evaluation is expensive, for example when fitting a deep network on a large dataset.[19][20]

## What is neural architecture search?

[Neural architecture search](/wiki/neural_architecture_search) (NAS) is the sub-area of AutoML concerned with discovering the topology of deep networks. The modern wave began with Zoph and Le's 2017 ICLR paper "Neural Architecture Search with Reinforcement Learning," in which a recurrent controller proposes architectures encoded as variable-length strings, the proposed network is trained on CIFAR-10, and the validation accuracy is fed back to the controller as a reinforcement-learning reward. The original system used about 800 GPUs for several weeks but produced architectures competitive with the best human designs of the time.[6]

Follow-up work attacked the cost. NASNet (Zoph, Vasudevan, Shlens and Le 2018) restricted the search to a small "cell" that is then stacked, allowing search on CIFAR-10 to transfer to ImageNet. ENAS (Pham, Guan, Zoph, Le and Dean 2018, ICML) introduced parameter sharing between candidate architectures, which the authors reported as roughly 1000x cheaper than the original NAS.[8] DARTS (Liu, Simonyan and Yang 2019, ICLR) replaced the discrete search with a continuous relaxation, allowing gradient descent to optimise architecture weights directly. The DARTS paper reported competitive results on CIFAR-10 and Penn Treebank in the order of one GPU-day rather than thousands.[7]

Latency-aware search came next. MnasNet (Tan, Chen, Pang, Vasudevan, Sandler, Howard and Le, CVPR 2019) added measured on-device latency to the reward, so that the search produced architectures suitable for mobile inference.[9] EfficientNet (Tan and Le, ICML 2019) used a NAS-derived baseline (EfficientNet-B0) and a principled compound-scaling rule for width, depth and resolution; the resulting family of models reached state-of-the-art ImageNet accuracy with a fraction of the parameters of earlier networks.[10]

The table below summarises the main NAS approaches.

| Method | Year | Search strategy | Notes |
|---|---|---|---|
| NAS with RL | Zoph and Le 2017 | RNN controller trained by REINFORCE | First modern NAS; thousands of GPU-days |
| NASNet | Zoph et al. 2018 | RL on transferable cells | CIFAR-10 search transfers to ImageNet |
| ENAS | Pham et al. 2018 | RL with weight sharing | About 1000x cheaper than NAS |
| DARTS | Liu, Simonyan and Yang 2019 | Differentiable, gradient-based | Continuous relaxation of architecture |
| MnasNet | Tan et al. 2019 | RL with latency objective | Targets mobile inference |
| EfficientNet | Tan and Le 2019 | NAS baseline plus compound scaling | State-of-the-art accuracy / parameter trade-off |
| Auto-Keras | Jin, Song and Hu 2019 | Bayesian optimisation over network morphisms | Integrates with Keras API |

## Major AutoML systems

Research systems and commercial products have proliferated since around 2015. The table below compares the most widely used.

| System | First release | Maintainer | Approach | Modalities |
|---|---|---|---|---|
| Auto-WEKA | 2013 | University of British Columbia | Bayesian optimisation (SMAC) over WEKA classifiers | Tabular |
| auto-sklearn | 2015 | University of Freiburg | Meta-learning warm-start, SMAC, ensemble selection | Tabular |
| TPOT | 2016 | University of Pennsylvania | Genetic programming over scikit-learn pipelines | Tabular |
| H2O AutoML | 2017 | H2O.ai | Random search and grid search plus stacked ensembles | Tabular |
| Google Cloud AutoML | 2018 | Google Cloud | Transfer learning and NAS, later folded into Vertex AI | Vision, language, tables, video |
| Auto-Keras | 2018 (paper 2019) | Texas A&M | Bayesian optimisation over network morphisms | Vision, text |
| Azure Automated ML | 2018 | Microsoft | Probabilistic matrix factorisation for warm-start, plus SMAC and ensembles | Tabular, vision, NLP |
| Amazon SageMaker Autopilot | 2019 | AWS | White-box AutoML producing notebooks | Tabular |
| AutoGluon | 2020 | Amazon | Multi-layer stacking, no model search per se | Tabular, text, image, multimodal, time series |
| FLAML | 2021 | Microsoft | Cost-aware search (CFO, BlendSearch) | Tabular |
| LightAutoML | 2021 | Sber AI Lab | Modular pipeline tuned for financial data | Tabular |
| DataRobot | 2014 | DataRobot Inc. | Commercial closed-source platform | Tabular, text, time series |
| MLJAR, PyCaret | 2019 | Independent | Open-source wrappers around scikit-learn and gradient boosting | Tabular |

A few of these warrant a closer look.

**Auto-sklearn** (Feurer, Klein, Eggensperger, Springenberg, Blum and Hutter, NeurIPS 2015) defines a structured search space of 15 classifiers, 14 feature preprocessing methods and 4 data preprocessing methods, giving 110 hyperparameters in total. The system uses two ideas not present in earlier work. First, it warm-starts SMAC by retrieving promising configurations from datasets that are similar in meta-features to the current one, drawing on a database of past runs. Second, it builds an ensemble out of all configurations evaluated during search, rather than returning only the single best, which markedly reduces variance. Auto-sklearn won the first phase of the ChaLearn AutoML challenge.[2] A redesigned successor, Auto-sklearn 2.0 (Feurer, Eggensperger, Falkner, Lindauer and Hutter, JMLR 2022), dropped hand-crafted meta-features in favour of a pre-computed portfolio of complementary pipelines plus a bandit-based budget allocation, which the authors report improves performance under tight time limits.[26]

**TPOT** (Olson and Moore, ICML 2016 AutoML workshop) treats a pipeline as an expression tree and uses genetic programming to evolve it. Operators include preprocessing transformations and scikit-learn estimators. TPOT exposes the discovered pipeline as Python code, which makes it easy to inspect and edit. On 150 supervised classification tasks the original benchmark reported significant improvements over a default scikit-learn baseline on 22 of them.[3]

**H2O AutoML** (LeDell and Poirier, ICML 2020 AutoML workshop) takes a more pragmatic line. It trains a fixed grid of GLMs, random forests, gradient-boosting machines (including XGBoost), and deep neural networks within a user-specified time budget, then builds two stacked ensembles, one over all models and one restricted to the best of each family. The H2O AutoML algorithm was first released in H2O 3.12.0.1 in June 2017. It has APIs in R, Python, Java and Scala.[5]

**AutoGluon** (Erickson, Mueller, Shirkov, Zhang, Larroy, Li and Smola 2020) made an explicit choice not to search hyperparameters or architectures. Instead, it trains a fixed set of strong models with sensible defaults and combines them via multi-layer stacking. The 2020 paper reported that AutoGluon-Tabular "beats 99% of the participating data scientists" after four hours on raw data, placing it 42nd of 3,505 teams on the Otto Group competition and 39th of 2,920 teams on the BNP Paribas competition, and outperformed auto-sklearn, H2O and TPOT on the OpenML AutoML Benchmark.[12]

**FLAML** (Wang, Wu, Weimer and Zhu, MLSys 2021) emphasises cost. Its CFO and BlendSearch algorithms model the cost of an evaluation as a function of hyperparameters and choose configurations that maximise expected improvement per unit cost. The library is intentionally lightweight: it ships with a few hundred lines of core search code rather than a heavyweight framework.[11]

**Google Cloud AutoML** launched on 17 January 2018 with Cloud AutoML Vision, a no-code service that fine-tunes a pretrained image classifier on user-uploaded data using transfer learning and architecture search. Google later added Natural Language, Translation, Tables and Video, and folded all of them into Vertex AI in 2021.[24]

## Tabular foundation models: the TabPFN shift

The newest development in AutoML is the tabular foundation model, which sidesteps per-dataset search entirely. TabPFN ("Tabular Prior-data Fitted Network"), introduced by Hollmann, Mueller, Purucker and colleagues at the University of Freiburg and later commercialised by Prior Labs (founded 2024), is a transformer pre-trained once on millions of synthetic tabular datasets. At inference it performs in-context learning: the training rows are passed in as context and predictions for new rows come out of a single forward pass, with no gradient descent or hyperparameter tuning on the target dataset.[25]

The 2025 Nature paper, "Accurate predictions on small data with a tabular foundation model," reports that TabPFN v2 "outperforms all previous methods on datasets with up to 10,000 samples by a wide margin, using substantially less training time."[25] The headline efficiency claim is striking: "In 2.8 s, TabPFN outperforms an ensemble of the strongest baselines tuned for 4 h in a classification setting."[25] The model targets datasets with up to roughly 10,000 samples and 500 features, and reports speedups of about 5,140x for classification and 3,000x for regression relative to tuned gradient-boosted decision trees.[25] The paper appeared in *Nature* volume 637, issue 8045, pages 319-326.[25] A later release, TabPFN v2.5, extends the practical envelope to datasets with up to 50,000 rows and 2,000 features.[27]

TabPFN does not replace AutoML so much as reframe it. For small clean tables it offers a strong, near-instant default that needs no search, which is exactly the regime where classical AutoML systems spent most of their compute. For larger datasets, mixed modalities, and time-series or multimodal problems, search-based systems such as AutoGluon and H2O AutoML remain the standard, and several AutoML frameworks now include TabPFN as one more base learner in their model portfolio.

## Key concepts

A handful of ideas appear repeatedly across AutoML systems.

**Meta-learning and warm-starting.** When a new dataset arrives, an AutoML system can use prior knowledge from previous runs on similar datasets to bias the initial configurations of the optimiser. Auto-sklearn does this by computing meta-features (number of samples, number of classes, skewness, kurtosis and so on) and retrieving the 25 most similar OpenML datasets, then seeding SMAC with their best-known configurations.[2] Vanschoren's chapter in the 2019 book gives a thorough treatment.[1]

**Multi-fidelity optimisation.** Evaluating a configuration on the full dataset for the full number of epochs is expensive. Multi-fidelity methods cheat by evaluating on a subset of the data, for fewer epochs, or with a smaller model, and use these cheap proxies to filter unpromising configurations before committing real budget. Hyperband and BOHB are the canonical examples.[19][20]

**Surrogate models.** A surrogate is a cheap-to-evaluate proxy for the expensive objective (validation accuracy after full training). Gaussian processes (Spearmint), random forests (SMAC), TPE density estimators, and neural-network surrogates have all been used. The acquisition function on top of the surrogate decides where to evaluate next.

**Bandits and successive halving.** Successive halving treats configurations as arms of a multi-armed bandit. Run all of them for a small budget, keep the top half, double the budget, repeat. Hyperband runs successive halving with several bracket sizes to hedge against picking a wrong starting budget.[19]

**Time and compute budgeting.** Real users care about wall-clock time. AutoML systems usually expose a budget parameter and try to make the best use of it. FLAML's headline feature is that it can do useful work in seconds rather than hours.[11]

## What are the strengths of AutoML?

For tabular data with reasonably clean inputs, modern AutoML systems frequently match or beat hand-tuned pipelines built by experienced data scientists, especially within fixed time budgets. The OpenML AutoML Benchmark by Gijsbers, LeDell, Thomas, Poirier, Bischl and Vanschoren (2019, updated as a JMLR paper in 2024) shows tight competition between auto-sklearn, H2O AutoML, TPOT, AutoGluon and FLAML on tabular tasks, with AutoGluon and H2O often near the top.[22]

The key advantages are a lower barrier to entry, since a non-expert can produce a reasonable model with a few lines of code; strong baselines that set a high bar for any hand-crafted alternative; reproducibility, because pipelines and seeds are recorded; and broader coverage of the search space, since optimisers explore configurations a human would not think to try.

## What are the limitations of AutoML?

AutoML is not a silver bullet, and several limitations are well documented.

It is compute-intensive. A long auto-sklearn run can consume tens of CPU-hours; a NAS run can consume thousands of GPU-hours. The original Zoph and Le NAS paper used the equivalent of around 22,400 GPU-hours.[6] Even after the cost reductions delivered by ENAS and DARTS, neural architecture search remains more expensive than training a single off-the-shelf network.[8][7]

The resulting pipeline is mostly opaque. A stacked ensemble of 30 gradient-boosting machines and random forests is not easy to explain to a regulator or domain expert. AutoGluon and similar systems trade interpretability for accuracy, which is acceptable in many applications and unacceptable in others.

Results can be brittle on novel domains. AutoML systems are tuned on benchmark distributions and may underperform on data whose structure was not anticipated, such as sparse high-cardinality categorical data, time series with irregular sampling, or scientific data with strong physical constraints.

Long searches risk overfitting to the validation set. If 10,000 configurations are evaluated against the same validation split, the best one is partly selected for noise. Cross-validation and ensemble averaging mitigate this but do not eliminate the multiple-comparisons problem. On high-stakes problems where human expertise is plentiful, manual tuning by a skilled team often still wins. AutoML is most valuable where expertise is the bottleneck.

## How is AutoML benchmarked?

The OpenML AutoML Benchmark, introduced by Gijsbers et al. at the ICML 2019 AutoML workshop, is the standard evaluation suite for tabular AutoML. It defines a curated list of binary and multiclass classification tasks (and later regression tasks) drawn from OpenML, fixes time budgets (typically one hour and four hours), and runs each system in a Docker container with controlled resources. The 2024 JMLR update by Gijsbers et al. extended the benchmark to 71 tasks and 11 frameworks. Results are publicly available and updated regularly.[22]

For neural architecture search the canonical benchmarks are CIFAR-10 and ImageNet image classification, with NAS-discovered architectures (NASNet, MnasNet, EfficientNet) achieving state-of-the-art accuracy in their respective compute classes.[9][10] NAS-Bench-101 (Ying et al. 2019) and NAS-Bench-201 (Dong and Yang 2020) provide tabulated training results for many architectures, allowing fast and reproducible comparisons of NAS algorithms without re-running expensive training.

## Real-world applications

AutoML has been deployed in many industries. Documented use cases include drug discovery and molecular property prediction, genomics and clinical risk prediction, time-series forecasting in retail and supply chain (AutoGluon-TimeSeries and similar tools produce probabilistic forecasts at scale), recommendation systems with cold-start models on new product categories, manufacturing quality control using vision AutoML services to detect defects, and standard business analytics such as sales forecasting, customer churn modelling and marketing-mix attribution. At Sber, LightAutoML was used in production for credit-risk and customer-analytics tasks.[13]

## Open-source ecosystem

A practitioner today can build an AutoML stack entirely from open-source components. Common building blocks include auto-sklearn for tabular classification and regression with meta-learning warm-start; TPOT for genetic programming over pipelines; AutoGluon for tabular, multimodal and time-series tasks; FLAML for fast and lightweight HPO; Optuna (Akiba, Sano, Yanase, Ohta and Koyama, KDD 2019), a define-by-run hyperparameter optimisation framework used as a backend by many other systems;[23] Ray Tune for distributed hyperparameter search with Hyperband, BOHB and PBT; Hyperopt, the original TPE implementation by James Bergstra; SMAC3 (Lindauer et al. 2022), the modern Python SMAC implementation; the TabPFN package from Prior Labs for tabular foundation-model inference; and NNI (Microsoft Neural Network Intelligence), a toolkit covering HPO, NAS, model compression and feature engineering.

## Commercial offerings

Alongside the open-source projects, every major cloud has its own AutoML product. The table below summarises them.

| Vendor | Product | Modality | Launched | Notes |
|---|---|---|---|---|
| Google Cloud | Cloud AutoML, then Vertex AI AutoML | Vision, NLP, Tables, Video | 2018 | First major no-code cloud AutoML |
| Microsoft Azure | Azure Automated ML | Tabular, vision, NLP | 2018 | Integrated with Azure ML Studio |
| AWS | SageMaker Autopilot | Tabular | 2019 | White-box approach: returns generated notebooks |
| AWS | Amazon SageMaker Canvas | Tabular | 2021 | Business-analyst-oriented no-code interface |
| DataRobot | DataRobot AI Cloud | Tabular, text, time series | 2014 | Pioneering enterprise AutoML vendor |
| H2O.ai | Driverless AI | Tabular, time series, NLP | 2017 | Commercial counterpart to H2O AutoML |
| dotData | dotData Enterprise | Tabular with auto-feature-engineering | 2018 | Spin-off from NEC research |
| Prior Labs | TabPFN (API and enterprise) | Tabular | 2024 | Tabular foundation model, in-context prediction |

## Connection to broader AI

AutoML overlaps with [MLOps](/wiki/mlops). MLOps is concerned with the full lifecycle of model deployment, monitoring and retraining; AutoML focuses on the model-building stage. Many MLOps platforms include an AutoML component to bootstrap models that are then handed to deployment pipelines.

The rise of foundation models has shifted the centre of gravity. Where AutoML once asked which small model is best for this dataset, the question for foundation-model users is closer to which adapter, learning rate and data mixture should I use to fine-tune this pretrained model. Research on AutoML for foundation models, including AutoLoRA (Zhang et al. 2023) and automated parameter-efficient fine-tuning, is active, and tools such as FLAML and Optuna are increasingly used to tune LoRA ranks, prompt templates and retrieval parameters. NAS has continued to evolve in parallel, with active work on transformer-based language models, graph neural networks (Graph NAS, Gao et al. 2020) and multimodal architectures.

## Future directions

Several open problems are likely to shape AutoML in the coming years. Automated data engineering and labelling is one: many real-world failures are caused by data issues rather than modelling issues, and tools for automated data quality checks, labelling and weak supervision are an active research area. AutoML for foundation models is another, with the combinatorial space of base model, fine-tuning regime, retrieval component and prompt much larger than classical model selection. Tabular foundation models such as TabPFN raise the question of how much per-dataset search is still needed when a pre-trained model can predict in one forward pass. AutoML for graph neural networks and other non-tabular structures is less mature than its tabular and image counterparts. Multimodal AutoML, building on systems like AutoGluon-Multimodal, has room for better fusion architectures and automated cross-modal preprocessing. Compute-efficient NAS, using once-for-all networks, supernet training and zero-cost proxies, aims to bring architecture search closer to the cost of a single training run. Finally, better evaluation methodology, beyond OpenML AutoML Benchmark and NAS-Bench-x, is needed because real-world deployment performance is still hard to predict from benchmark scores.

## References

1. Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren (eds.). *Automated Machine Learning: Methods, Systems, Challenges*. Springer Series on Challenges in Machine Learning, 2019. Open access: https://www.automl.org/book/
2. Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, Manuel Blum, Frank Hutter. "Efficient and Robust Automated Machine Learning." In *Advances in Neural Information Processing Systems 28* (NeurIPS 2015). https://papers.neurips.cc/paper/2015/hash/11d0e6287202fced83f79975ec59a3a6-Abstract.html
3. Randal S. Olson and Jason H. Moore. "TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning." In *ICML 2016 AutoML Workshop*. http://proceedings.mlr.press/v64/olson_tpot_2016.html
4. Haifeng Jin, Qingquan Song, Xia Hu. "Auto-Keras: An Efficient Neural Architecture Search System." In *Proceedings of KDD 2019*. arXiv:1806.10282. https://arxiv.org/abs/1806.10282
5. Erin LeDell and Sebastien Poirier. "H2O AutoML: Scalable Automatic Machine Learning." *7th ICML Workshop on Automated Machine Learning*, 2020. https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf
6. Barret Zoph and Quoc V. Le. "Neural Architecture Search with Reinforcement Learning." *ICLR 2017*. arXiv:1611.01578. https://arxiv.org/abs/1611.01578
7. Hanxiao Liu, Karen Simonyan, Yiming Yang. "DARTS: Differentiable Architecture Search." *ICLR 2019*. arXiv:1806.09055. https://arxiv.org/abs/1806.09055
8. Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean. "Efficient Neural Architecture Search via Parameter Sharing." *ICML 2018*. https://proceedings.mlr.press/v80/pham18a.html
9. Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le. "MnasNet: Platform-Aware Neural Architecture Search for Mobile." *CVPR 2019*. arXiv:1807.11626.
10. Mingxing Tan and Quoc V. Le. "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks." *ICML 2019*. https://proceedings.mlr.press/v97/tan19a/tan19a.pdf
11. Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu. "FLAML: A Fast and Lightweight AutoML Library." *MLSys 2021*. arXiv:1911.04706. https://www.microsoft.com/en-us/research/wp-content/uploads/2021/03/MLSys21FLAML.pdf
12. Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, Alexander Smola. "AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data." arXiv:2003.06505, 2020. https://arxiv.org/abs/2003.06505
13. Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin. "LightAutoML: AutoML Solution for a Large Financial Services Ecosystem." arXiv:2109.01528, 2021.
14. Xin He, Kaiyong Zhao, Xiaowen Chu. "AutoML: A Survey of the State-of-the-Art." *Knowledge-Based Systems*, vol. 212, 2021. arXiv:1908.00709.
15. James Bergstra and Yoshua Bengio. "Random Search for Hyper-Parameter Optimization." *Journal of Machine Learning Research* 13, pp. 281-305, 2012. https://jmlr.org/papers/v13/bergstra12a.html
16. Jasper Snoek, Hugo Larochelle, Ryan P. Adams. "Practical Bayesian Optimization of Machine Learning Algorithms." *NeurIPS 2012*. arXiv:1206.2944.
17. James Bergstra, Remi Bardenet, Yoshua Bengio, Balazs Kegl. "Algorithms for Hyper-Parameter Optimization." *NeurIPS 2011*.
18. Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown. "Sequential Model-Based Optimization for General Algorithm Configuration." *LION 2011*.
19. Lisha Li, Kevin Jamieson, Giulio DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization." *JMLR* 18, 2018. arXiv:1603.06560.
20. Stefan Falkner, Aaron Klein, Frank Hutter. "BOHB: Robust and Efficient Hyperparameter Optimization at Scale." *ICML 2018*.
21. Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu. "Population Based Training of Neural Networks." arXiv:1711.09846, 2017.
22. Pieter Gijsbers, Erin LeDell, Janek Thomas, Sebastien Poirier, Bernd Bischl, Joaquin Vanschoren. "An Open Source AutoML Benchmark." *ICML 2019 AutoML Workshop*. arXiv:1907.00909. Updated journal version: *JMLR* 25, 2024.
23. Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama. "Optuna: A Next-generation Hyperparameter Optimization Framework." *KDD 2019*. arXiv:1907.10902.
24. Google Cloud. "Cloud AutoML: Making AI accessible to every business." Announcement, 17 January 2018.
25. Noah Hollmann, Samuel Mueller, Lennart Purucker, Arjun Krishnakumar, Max Koerfer, Shi Bin Hoo, Robin Tibor Schirrmeister, Frank Hutter. "Accurate predictions on small data with a tabular foundation model." *Nature* 637, no. 8045, pp. 319-326, 2025. https://www.nature.com/articles/s41586-024-08328-6
26. Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, Frank Hutter. "Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning." *Journal of Machine Learning Research* 23, 2022. arXiv:2007.04074. https://jmlr.org/papers/v23/21-0992.html
27. Prior Labs. "TabPFN-2.5 model card." Hugging Face, 2025. https://huggingface.co/Prior-Labs/tabpfn_2_5