MLflow

Developer Tools MLOps Machine Learning Open Source AI

19 min read

Updated Apr 26, 2026

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. Created by Databricks and first released in June 2018, MLflow provides tools for experiment tracking, model packaging, a model registry, and deployment. The project was co-created by Matei Zaharia, who also created Apache Spark, alongside other engineers at Databricks. MLflow is licensed under the Apache 2.0 license and joined the Linux Foundation in 2020 as a vendor-neutral open-source project.

With over 24,000 GitHub stars, more than 60 million monthly downloads, and 900+ contributors, MLflow has become one of the most widely adopted MLOps platforms in the industry. It supports Python, Java, R, and REST APIs, and integrates with a wide range of ML and deep learning frameworks.

History

MLflow was announced on June 5, 2018, at the Spark + AI Summit in San Francisco. Databricks released it as an open-source alpha, aiming to address a persistent problem in machine learning: the difficulty of tracking experiments, reproducing results, and deploying models in a consistent manner. At the time of its introduction, the ML ecosystem lacked standardized tools for managing the full model lifecycle, and teams often relied on ad hoc scripts and manual processes.

The initial alpha included three core components: MLflow Tracking, MLflow Projects, and MLflow Models. The Model Registry was added later to provide centralized model management with versioning and stage transitions.

MLflow 1.0 was released on June 4, 2019, marking the project's first stable release with guaranteed API stability across Python, Java, R, and REST interfaces. By this time, MLflow had already accumulated a growing user base with over 1 million downloads.

In June 2020, Databricks contributed MLflow to the Linux Foundation, establishing it as a vendor-neutral project governed by an independent community. This move was intended to ensure long-term open-source stewardship and encourage broader industry participation.

MLflow 2.0 arrived on November 15, 2022, with a focus on simplifying data science workflows and introducing new evaluation and deployment capabilities. The 2.x release series also brought initial support for large language models (LLMs), including the MLflow AI Gateway and tracing features.

MLflow 3.0 was released on June 9, 2025, representing a substantial architectural shift. This version introduced first-class support for generative AI applications and agents, a unified evaluation framework, and the LoggedModel entity as a new core abstraction. MLflow 3 removed several deprecated components, including MLflow Recipes and the fastai flavor.

As of March 2026, the latest stable release is MLflow 3.10.x, with active development continuing on the 3.x series.

Release timeline

Version	Release date	Highlights
0.1 (alpha)	June 2018	Initial release with Tracking, Projects, and Models
1.0	June 2019	Stable API; Python, Java, R, and REST interface guarantees
2.0	November 2022	Revamped Tracking UI, MLflow Recipes, Keras/TensorFlow unification
2.7	2023	Experimental AI Gateway for LLM providers
2.8	Late 2023	LLM-as-a-Judge evaluation metrics for RAG applications
3.0	June 2025	LoggedModel entity, GenAI-first architecture, removal of Recipes
3.10.x	February 2026	Latest stable release

Core components

MLflow is organized around several distinct components that can be used independently or together. Each component addresses a specific stage of the machine learning lifecycle.

MLflow Tracking

MLflow Tracking is an API and UI for logging parameters, code versions, metrics, and artifacts during ML experiments. It is organized around the concept of "runs," where each run represents a single execution of training code (for example, a single invocation of a Python training script). For each run, MLflow records:

Data type	Description	Examples
Parameters	Input configuration values	Learning rate, batch size, number of epochs
Metrics	Output measurements logged over time	Accuracy, loss, F1 score
Artifacts	Output files from the run	Model weights, images, serialized pipelines
Tags and metadata	Custom labels and run information	Source code version, start and end times

Runs are grouped into "experiments," which allow users to organize related training sessions. The Tracking UI provides visualization tools for comparing runs side by side, plotting metric curves, and filtering results. MLflow Tracking supports multiple backend storage options, including local file storage, SQLite, PostgreSQL, MySQL, and cloud-based solutions such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.

Autologging is a feature that automatically captures parameters and metrics from supported frameworks without requiring manual instrumentation. Frameworks with autologging support include scikit-learn, TensorFlow, PyTorch, Keras, XGBoost, LightGBM, Spark MLlib, and Statsmodels.

MLflow Models

MLflow Models provides a standard format for packaging ML models so they can be used across different serving environments. The packaging format uses "flavors" to describe how a model can be interpreted by different tools. A single model can have multiple flavors; for example, a scikit-learn model might be saved with both an sklearn flavor (for native scikit-learn loading) and a python_function flavor (for generic Python-based inference).

Every MLflow Model includes an MLmodel YAML file that lists the flavors the model supports. The python_function flavor is the most universal, providing a generic Python interface for inference regardless of the original training framework. This allows any MLflow Model to be deployed to any platform that supports Python.

The built-in flavors supported by MLflow include:

Flavor	Module	Description
Python Function	`mlflow.pyfunc`	Generic Python callable; all other flavors can be loaded as pyfunc
Scikit-learn	`mlflow.sklearn`	Classification, regression, and clustering models
TensorFlow	`mlflow.tensorflow`	TensorFlow SavedModel format with autologging
Keras	`mlflow.keras`	Keras 3.0 with multi-backend support (TensorFlow, JAX, PyTorch)
PyTorch	`mlflow.pytorch`	PyTorch models with custom training loop tracking
Spark MLlib	`mlflow.spark`	Apache Spark ML pipeline models
XGBoost	`mlflow.xgboost`	Gradient boosting models
LightGBM	`mlflow.lightgbm`	Microsoft LightGBM models
CatBoost	`mlflow.catboost`	Yandex CatBoost models
ONNX	`mlflow.onnx`	Open Neural Network Exchange format for cross-platform deployment
Transformers	`mlflow.transformers`	Hugging Face Transformers models for NLP and LLMs
Sentence Transformers	`mlflow.sentence_transformers`	Embedding and similarity models
spaCy	`mlflow.spacy`	NLP pipeline models
Statsmodels	`mlflow.statsmodels`	Statistical models
Prophet	`mlflow.prophet`	Facebook Prophet time series forecasting
Pmdarima	`mlflow.pmdarima`	Auto-ARIMA time series models
H2O	`mlflow.h2o`	H2O.ai models
John Snow Labs	`mlflow.johnsnowlabs`	Healthcare and biomedical NLP

Beyond built-in flavors, the community has developed additional flavors for frameworks such as sktime, orbit, and other specialized libraries through the MLflavors package.

Models can be deployed using mlflow models serve to create a local REST API endpoint, built into Docker containers using mlflow models build-docker, or deployed to cloud platforms such as Amazon SageMaker, Azure ML, and Databricks Model Serving.

MLflow Model Registry

The Model Registry is a centralized store for managing the full lifecycle of MLflow Models. It provides model versioning, stage transitions, and annotations. Teams can use the registry to:

Register trained models with unique names and version numbers
Transition models between stages (for example, from "Staging" to "Production" or "Archived")
Add descriptions and metadata to models and their versions
Track which experiment and run produced each model version (lineage)
Review and approve model versions through a collaborative workflow

The registry exposes both a UI and a set of APIs for programmatic access. On Databricks, the Model Registry integrates with Unity Catalog for access control and governance.

MLflow Projects

MLflow Projects is a format for packaging reusable and reproducible data science code. A project is simply a directory or Git repository containing code, along with an MLproject file that specifies dependencies and entry points. The MLproject file can reference a Conda environment, a Docker container, or a system environment to define the execution context.

Projects allow users to run the same code on different platforms (local machine, cloud, or Kubernetes) with consistent behavior. They also support parameterized execution, so users can pass different hyperparameters or data paths at runtime. For example:

mlflow run git@github.com:mlflow/mlflow-example.git -P alpha=0.5

This command fetches the project from GitHub and executes it with the specified parameter.

MLflow Recipes (deprecated)

MLflow Recipes, previously known as MLflow Pipelines, was a framework that provided predefined templates for common ML tasks such as regression and classification. Recipes automated many steps of the ML workflow, including data ingestion, feature engineering, model training, and evaluation. The framework included an intelligent execution engine that cached intermediate results and re-ran only the steps affected by code changes.

MLflow Recipes was deprecated in MLflow 2.x and removed entirely in MLflow 3.0. Users who relied on Recipes are encouraged to use standard MLflow Tracking and Model Registry functionality directly, or to adopt MLflow Projects for reproducible workflows.

MLflow 2.x and LLM support

The MLflow 2.x release series (2022 to 2025) expanded the platform beyond traditional ML to support large language models, generative AI applications, and AI agents. This evolution reflects the broader industry shift toward LLM-powered applications.

LLM tracking and evaluation

MLflow 2.x added native support for logging and evaluating LLM outputs. The mlflow.evaluate() API allows users to run evaluation suites against model outputs, using built-in or custom metrics. Evaluation metrics for LLMs include answer relevance, faithfulness, toxicity, and other quality dimensions. Users can evaluate both live model endpoints and pre-computed output datasets.

The evaluation framework supports two categories of metrics:

Metric type	How it works	Examples
Heuristic-based	Deterministic scoring functions	ROUGE, BLEU, Flesch-Kincaid readability, latency
LLM-as-a-Judge	Uses a language model to assess quality	Faithfulness, answer correctness, toxicity, custom criteria

LLM-as-a-Judge metrics, introduced in MLflow 2.8, use language models to score output quality. They address the limitations of heuristic metrics for nuanced language tasks and can reduce evaluation time from weeks (with human evaluators) to under an hour while maintaining useful quality approximations. MLflow supports multiple LLM providers as judges, including OpenAI, Anthropic, Amazon Bedrock, and Mistral AI.

MLflow AI Gateway

The MLflow AI Gateway (introduced experimentally in MLflow 2.7) is a centralized proxy that sits between applications and LLM providers. It provides:

A unified, OpenAI-compatible API for all LLM providers, allowing teams to switch models by changing configuration rather than code
Centralized credential management with encrypted API key storage, so applications authenticate to the gateway instead of directly to LLM providers
Rate limiting and cost controls per endpoint, model, or team
Traffic splitting for A/B testing different models or providers, enabling gradual rollouts without code changes
Automatic failover to backup providers when the primary is unavailable, with configurable retry policies
Policy enforcement including content filtering and PII redaction at the gateway level
Usage tracking with detailed analytics on requests, token consumption, and costs

The AI Gateway integrates natively with MLflow Tracing, so every request routed through the gateway automatically becomes a trace. This provides a complete audit trail of LLM interactions across the organization without requiring additional instrumentation in application code.

MLflow Tracing

MLflow Tracing captures the complete execution flow of LLM applications and AI agents. Built on OpenTelemetry, it records inputs, outputs, and metadata for each step of a request, including LLM calls, retrieval operations, tool invocations, and error details.

Key tracing capabilities include:

Hierarchical span visualization showing nested operations within a single request
Automatic cost calculation based on model and token usage
In-UI evaluation allowing users to run LLM judges directly from the trace viewer
Session-level grouping for multi-turn conversations
Export to external observability platforms via the OpenTelemetry Protocol (OTLP)

Since MLflow Tracing is built on OpenTelemetry, it is compatible with any language or framework that supports the OTLP standard, including Java, Go, and Rust. The MLflow tracking server exposes an OTLP endpoint at /v1/traces for direct ingestion. MLflow 3.6.0 added formal support for ingesting OpenTelemetry traces directly through this endpoint, enabling teams to combine MLflow SDK instrumentation with OpenTelemetry auto-instrumentation from third-party libraries.

Tracing supports automatic instrumentation for over 20 frameworks and libraries, including LangChain, LlamaIndex, OpenAI, Anthropic, Amazon Bedrock, Google ADK, PydanticAI, and smolagents.

MLflow 3.0

MLflow 3.0 (released June 9, 2025) introduced architectural changes to support generative AI workloads as first-class citizens alongside traditional ML. The release was built around three major pillars: observability, systematic quality evaluation, and application lifecycle management.

LoggedModel entity

MLflow 3 introduced the LoggedModel as a new first-class entity, moving beyond the run-centric model that characterized earlier versions. A LoggedModel tracks the complete identity of a model or agent, including its lineage, evaluation results, and deployment status. This allows users to compare model variants and GenAI agents within and across experiments more effectively.

Unified evaluation framework

The evaluation framework in MLflow 3 supports customizable scorers that can assess multiple quality dimensions simultaneously. Users can define custom evaluation judges or use pre-built judges for tasks like relevance scoring, hallucination detection, and safety assessment. The framework works for both GenAI applications and traditional ML models through a consistent API.

Application lifecycle management

MLflow 3 treats GenAI applications as versioned artifacts. A complete application, including model weights, prompts, retrieval logic, and dependencies, can be packaged and versioned as a single unit. This enables atomic deployments and rollbacks, bringing the same rigor to GenAI application management that the Model Registry brought to traditional ML models.

Prompt Registry

The Prompt Registry, introduced as a standalone component, enables versioning, tracking, and reuse of prompts across an organization. Each prompt can be versioned independently, tagged with metadata, and referenced by downstream applications.

Breaking changes in MLflow 3

MLflow 3 removed several deprecated components to simplify the framework:

MLflow Recipes (the opinionated workflow framework)
The fastai and mleap model flavors
Various deprecated API parameters and legacy code paths
Certain legacy model signature behaviors

These removals were part of an effort to focus on core functionality and the GenAI capabilities that are central to the 3.x series.

MLflow on Databricks

Databricks offers a fully managed version of MLflow as part of the Databricks Data Intelligence Platform. Managed MLflow extends the open-source version with enterprise features designed for production workloads at scale.

Enterprise capabilities

Feature	Open-source MLflow	Managed MLflow (Databricks)
Experiment tracking	Yes	Yes, with managed storage and automatic scaling
Model Registry	Yes	Integrated with Unity Catalog
AI Gateway	Yes	Managed endpoints with enterprise governance
Tracing	Yes	Production-scale with managed infrastructure
Access control	Manual configuration	Unity Catalog role-based access control
Data lineage	Basic run-level	End-to-end with lakehouse integration
Hosting	Self-managed	Fully managed by Databricks
Multi-cloud support	Manual deployment	AWS, Azure, GCP via Databricks
Model serving	CLI/Docker-based	One-click REST API deployment with auto-scaling
Feature store	Not included	Integrated feature store with automated lookups

Unity Catalog integration is a distinguishing feature of managed MLflow. It allows organizations to enforce access controls, track lineage across models and data, and maintain compliance policies from a central governance layer. Models registered in managed MLflow can be discovered and shared across teams using the Unity Catalog interface.

Managed MLflow is available on all three major cloud providers through the Databricks platform: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform. Azure Databricks includes native MLflow integration documented through the Microsoft Learn platform.

Comparison with other MLOps platforms

MLflow competes with several other platforms for experiment tracking and ML lifecycle management. The following table compares MLflow with three widely used alternatives.

Feature	MLflow	Weights & Biases (W&B)	Neptune	ClearML
License	Apache 2.0 (open source)	Proprietary (free tier available)	Proprietary (free tier available)	SSPL (open source) + managed offering
Pricing	Free (self-hosted); paid via Databricks	Free for individuals; team plans ~$50/user/month	Usage-based; team plans from ~$49/month	Free (self-hosted); managed plans negotiable
Experiment tracking	Yes	Yes (advanced interactive dashboards)	Yes (high-scale querying and comparison)	Yes
Model registry	Yes	Yes (Artifacts + Registry)	Yes	Yes
Hyperparameter tuning	Via integrations (Optuna, Ray Tune)	Sweeps (built-in)	Via integrations	HyperParameter Optimizer (built-in)
LLM/GenAI support	AI Gateway, Tracing, LLM evaluation	Weave (tracing and evaluation)	Limited	Limited
Deployment tools	Built-in serving, Docker, cloud	No built-in deployment	No built-in deployment	Built-in serving and orchestration
Visualization UI	Functional, improving	Best-in-class interactive dashboards	Advanced querying and filtering	Comprehensive dashboard
Collaboration	Basic (shared tracking server)	Strong (teams, reports, annotations)	Strong (workspaces, sharing)	Moderate
Self-hosting	Full support	Enterprise plan only	No	Full support
Framework integrations	19+ built-in model flavors	Broad framework support	Broad framework support	Broad framework support
Community size	~24,800 GitHub stars	~20,000 GitHub stars	Smaller community	~6,000 GitHub stars

MLflow's primary advantage over proprietary alternatives is its open-source nature and the absence of licensing costs for self-hosted deployments. It is also the only platform in this comparison with a built-in AI Gateway and native LLM tracing based on OpenTelemetry. Weights & Biases is often preferred for its visualization capabilities and collaboration features, while Neptune is known for its ability to handle high-volume experiment metadata efficiently. ClearML offers a modular, all-in-one approach with built-in pipeline orchestration but has a steeper initial setup process compared to MLflow.

Framework integrations

MLflow integrates with a broad range of ML and AI frameworks. Beyond the built-in model flavors listed above, MLflow supports automatic logging (autologging) for several popular libraries. When autologging is enabled, MLflow automatically captures parameters, metrics, and model artifacts without requiring manual instrumentation code.

Autologging support

Framework	Autologging support	What gets captured
Scikit-learn	Yes	Parameters, metrics, and model for classifiers and regressors
TensorFlow/Keras	Yes	Training parameters, epoch metrics, and model checkpoints
PyTorch Lightning	Yes	Lightning-specific parameters, metrics, and checkpoints
XGBoost	Yes	Booster parameters, evaluation metrics, and feature importance
LightGBM	Yes	Training parameters and evaluation metrics
Spark MLlib	Yes	Pipeline parameters and model artifacts
Statsmodels	Yes	Model summary statistics
Hugging Face Transformers	Yes	Training arguments, metrics, and model artifacts
OpenAI	Yes	API calls, token usage, prompts, and completions
LangChain	Yes	Chain traces, model signatures, and input/output examples

MLflow also integrates with orchestration and deployment tools, including Kubernetes, Docker, Amazon SageMaker, Azure ML, and Databricks Model Serving. The ONNX flavor allows models trained in one framework to be exported and deployed in another, supporting cross-platform inference scenarios.

Architecture and deployment

The MLflow Tracking Server consists of two storage components:

Backend store: Stores experiment and run metadata, including parameters, metrics, and tags. Supported backends include file-based storage and SQL databases (PostgreSQL, MySQL, SQLite).
Artifact store: Stores larger output files such as model binaries, images, and data files. Supported artifact stores include local file systems, Amazon S3, Azure Blob Storage, Google Cloud Storage, SFTP, and NFS.

MLflow supports several deployment configurations:

Topology	Description	Suitable for
Local	Tracking server and storage on a single machine	Individual development and prototyping
Remote tracking server	Centralized server with database backend and cloud artifact store	Team collaboration
Databricks managed	Fully hosted on the Databricks platform	Enterprise production workloads

The tracking server exposes REST APIs that clients use to log and query experiment data. Multiple team members can connect to a shared tracking server to collaborate on experiments. Starting with MLflow 3.0, the server also exposes an OTLP endpoint for ingesting OpenTelemetry traces from applications written in any language.

Community and adoption

MLflow has experienced steady growth since its initial release in 2018. Key adoption milestones include:

Year	Milestone
2018	MLflow released as open-source alpha at Spark + AI Summit
2019	MLflow 1.0 released; surpassed 1 million total downloads
2020	MLflow joins the Linux Foundation
2021	Surpassed 10 million monthly downloads
2022	MLflow 2.0 released; surpassed 100 million total downloads
2024	Surpassed 200 million total downloads
2025	MLflow 3.0 released; reached 20,000 GitHub stars

As of early 2026, the MLflow GitHub repository reports over 24,000 stars, more than 5,500 forks, and contributions from over 900 developers. The project receives more than 60 million downloads per month from PyPI.

MLflow is used by thousands of organizations across industries including technology, finance, healthcare, and retail. Major cloud providers have built integrations with MLflow: Amazon SageMaker supports MLflow tracking, Microsoft Azure Machine Learning has native MLflow integration, and Google Cloud Vertex AI provides MLflow compatibility.

The project maintains active communication channels including a GitHub Discussions forum, a Slack workspace, and regular community meetups. Contributions are accepted through the standard GitHub pull request process, and the project follows a regular release cadence.

References

Databricks. "Introducing MLflow: an Open Source Machine Learning Platform." Databricks Blog, June 5, 2018. https://www.databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html
Databricks. "Announcing the MLflow 1.0 Release." Databricks Blog, June 6, 2019. https://www.databricks.com/blog/2019/06/06/announcing-the-mlflow-1-0-release.html
Linux Foundation. "The MLflow Project Joins Linux Foundation." Press Release, June 25, 2020. https://www.linuxfoundation.org/press/press-release/the-mlflow-project-joins-linux-foundation
Databricks. "Announcing Availability of MLflow 2.0." Databricks Blog, November 15, 2022. https://www.databricks.com/blog/2022/11/15/announcing-availability-mlflow-20.html
MLflow. "Announcing MLflow 3." MLflow Blog, June 9, 2025. https://mlflow.org/blog/mlflow-3-launch
Databricks. "MLflow 3.0: Build, Evaluate, and Deploy Generative AI with Confidence." Databricks Blog, 2025. https://www.databricks.com/blog/mlflow-30-unified-ai-experimentation-observability-and-governance
MLflow. "AI Gateway for LLMs & Agents." MLflow Documentation. https://mlflow.org/ai-gateway
MLflow. "Introducing MLflow AI Gateway: Governed, Observable Access to LLMs." MLflow Blog. https://mlflow.org/blog/mlflow-ai-gateway
MLflow. "LLM Tracing and Agent Observability." MLflow Documentation. https://mlflow.org/docs/latest/genai/tracing/
MLflow. "Full OpenTelemetry Support in MLflow Tracing." MLflow Blog. https://mlflow.org/blog/opentelemetry-tracing-support
MLflow. "ML Models." MLflow Documentation. https://mlflow.org/docs/latest/ml/model/
MLflow. "MLflow Tracking." MLflow Documentation. https://mlflow.org/docs/latest/tracking/
MLflow. "ML Model Registry." MLflow Documentation. https://mlflow.org/docs/latest/ml/model-registry/
Databricks. "MLflow on Databricks." Databricks Documentation. https://docs.databricks.com/aws/en/mlflow/
Databricks. "Managed MLflow." Databricks Product Page. https://www.databricks.com/product/managed-mlflow
Zaharia, M. et al. "Accelerating the Machine Learning Lifecycle with MLflow." IEEE Data Engineering Bulletin, 2018. https://people.eecs.berkeley.edu/~matei/papers/2018/ieee_mlflow.pdf
MLflow GitHub Repository. https://github.com/mlflow/mlflow

History

Release timeline

Core components

MLflow Tracking

MLflow Models

MLflow Model Registry

MLflow Projects

MLflow Recipes (deprecated)

MLflow 2.x and LLM support

LLM tracking and evaluation

MLflow AI Gateway

MLflow Tracing

MLflow 3.0

LoggedModel entity

Unified evaluation framework

Application lifecycle management

Prompt Registry

Breaking changes in MLflow 3

MLflow on Databricks

Enterprise capabilities

Comparison with other MLOps platforms

Framework integrations

Autologging support

Architecture and deployment

Community and adoption

See also

References

Related Articles

Kubeflow

Hugging Face

PyTorch

llama.cpp

Gradio

Safetensors

History

Release timeline

Core components

MLflow Tracking

MLflow Models

MLflow Model Registry

MLflow Projects

MLflow Recipes (deprecated)

MLflow 2.x and LLM support

LLM tracking and evaluation

MLflow AI Gateway

MLflow Tracing

MLflow 3.0

LoggedModel entity

Unified evaluation framework

Application lifecycle management

Prompt Registry

Breaking changes in MLflow 3

MLflow on Databricks

Enterprise capabilities

Comparison with other MLOps platforms

Framework integrations

Autologging support

Architecture and deployment

Community and adoption

See also

References

Related Articles

Kubeflow

Hugging Face

PyTorch

llama.cpp

Gradio

Safetensors