Weights & Biases

AI Tools & Products Machine Learning

18 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

19 citations

Revision

v5 · 3,537 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Weights & Biases (commonly abbreviated as W&B or wandb) is a machine learning experiment tracking and model management platform that acts as a system of record for AI model training and fine-tuning. Founded in 2017 in San Francisco by Lukas Biewald, Chris Van Pelt, and Shawn Lewis, W&B began as an experiment tracking tool built to help train models at OpenAI and grew into one of the most widely adopted MLOps platforms in the industry, used by more than 1,400 organizations and millions of practitioners as of 2025 ^[1]^[2]. In March 2025, GPU cloud provider CoreWeave agreed to acquire Weights & Biases for a reported $1.7 billion, a transaction that closed on May 5, 2025 ^[3].

W&B's core value proposition is straightforward: it logs every hyperparameter configuration, training metric, model artifact, and dataset version automatically, making it possible to compare runs, debug regressions, and share findings across teams. Version control systems like Git handle code well, but they were never designed to track the evolving state of model weights, training metrics, and large datasets, and W&B fills that gap. The platform has been used by leading AI organizations including OpenAI, NVIDIA, Meta, and Cohere during the training of large-scale models ^[2]^[4].

What is Weights & Biases used for?

Weights & Biases is used to make machine learning experiments reproducible, comparable, and collaborative. In a typical workflow, an ML engineer adds a few lines of the wandb Python library to a training script: every run's configuration, metrics, system utilization, code version, and output media are then captured automatically and streamed to a shared web dashboard. Teams use this record to compare hundreds or thousands of runs, identify which hyperparameters produced which results, debug training regressions, and hand experiments off between collaborators. CoreWeave describes the platform's role as a "system of record for training and fine-tuning AI models" ^[3].

Beyond core tracking, W&B's products span the ML lifecycle: hyperparameter tuning (Sweeps), data and model versioning (Artifacts), collaborative reporting (Reports), interactive data exploration (Tables), compute orchestration (Launch), a model registry, and, for the generative AI era, Weave for large language model observability and evaluation.

Who founded Weights & Biases, and when?

Weights & Biases was co-founded in July 2017 in San Francisco by three technologists ^[5]. Lukas Biewald, the CEO, had previously founded CrowdFlower (later renamed Figure Eight), a data labeling platform that was acquired by Appen in 2019. Chris Van Pelt, who co-founded CrowdFlower with Biewald, brought deep expertise in developer tooling. Shawn Lewis, the Chief Technology Officer, contributed research and engineering experience in machine learning infrastructure ^[4]^[5].

The founding insight was born from frustration. Machine learning practitioners were tracking experiments in spreadsheets, naming model checkpoints with cryptic suffixes, and losing track of which hyperparameters produced which results. As the company puts it, when W&B was founded, tracking models "was mostly a manual exercise; reproducing them was practically impossible" ^[2]. The initial experiment tracking product was developed to help build models at OpenAI, then generalized into a tool any team could adopt ^[2].

The initial product focused on experiment tracking: a lightweight Python SDK that could be integrated into any training script with just a few lines of code. Users called wandb.init() to start a run, wandb.log() to record metrics, and wandb.config to store hyperparameters. The simplicity of this interface was a deliberate choice, and it became a defining characteristic of the platform.

History

Founding and early days (2017-2019)

The founders set out to build a purpose-built tool for the experiment-tracking problem that Git and traditional version control did not solve. The lightweight Python SDK, installable via pip install wandb, let users instrument an existing training loop in minutes rather than restructuring their code, and that low-friction onboarding drove early grassroots adoption among researchers.

Growth and adoption (2020-2023)

W&B's adoption accelerated dramatically during the deep learning boom of the early 2020s. As organizations scaled up their model training, the need for systematic experiment management became acute. The platform gained particular traction in the research community, where reproducibility and collaboration are paramount.

OpenAI became one of W&B's most prominent users, relying on the platform to track experiments during the development of its GPT series of large language models. Meta's AI research division (FAIR) and various teams at NVIDIA also adopted the platform ^[2]^[4]. This association with cutting-edge AI research lent W&B significant credibility and drove adoption among smaller organizations that aspired to follow similar practices.

During this period, the company expanded its product suite well beyond basic experiment tracking, adding hyperparameter tuning (Sweeps), data versioning (Artifacts), collaborative reporting (Reports), interactive data exploration (Tables), compute management (Launch), and a model registry. At its annual Fully Connected conference in June 2023, W&B previewed W&B Weave and W&B Production Monitoring, signaling its move into the generative AI tooling space ^[6].

How was Weights & Biases funded?

Weights & Biases raised a total of approximately $250 million across multiple funding rounds, reaching a $1.25 billion valuation in August 2023 ^[7]^[8].

Round	Date	Amount	Key Investors
Seed	2018	Undisclosed	Trinity Ventures
Series A	2020	$15M	Insight Partners
Series B	2021	$135M	Insight Partners, Felicis Ventures
Series C	August 2023	$50M	Daniel Gross, Nat Friedman, Sapphire Ventures, Coatue, Insight Partners, Felicis, BOND, BloombergBeta
Total	-	~$250M	-

The Series B round valued the company at approximately $1.25 billion, granting it unicorn status, and the August 2023 Series C closed at the same $1.25 billion valuation ^[7]^[8]. The $50 million Series C was led by Daniel Gross and Nat Friedman and was announced alongside W&B Prompts, a tool that, in the company's words, "allows LLM builders to better understand every step of their LLM programs so they can fine-tune, prompt engineer, and debug the latest foundational models" ^[8]. The continued investment reflected strong market confidence in the MLOps category and W&B's position within it.

When did CoreWeave acquire Weights & Biases?

On March 4, 2025, CoreWeave (Nasdaq: CRWV), a cloud infrastructure company specializing in GPU computing, announced an agreement to acquire Weights & Biases. The deal closed on May 5, 2025, at a price reported by The Information at $1.7 billion; CoreWeave announced the transaction without disclosing the sum ^[3]^[4].

The acquisition was strategic for CoreWeave, which sought to complement its infrastructure offerings with a software platform for AI model development. By combining CoreWeave's GPU cloud with W&B's developer tools, the merged entity aimed to offer a vertically integrated platform spanning compute, training, evaluation, and monitoring. CoreWeave CEO Michael Intrator said the deal "accelerates our ability to power AI innovation with the scale, performance, and expertise that meets the demands of accelerated computing," while W&B co-founder Lukas Biewald said, "Together with CoreWeave, we believe we can empower customers to realize the potential of AI" ^[3]. CoreWeave committed to maintaining W&B's interoperability, ensuring customers could continue to use any deployment option, infrastructure provider, foundation model, framework, or protocol ^[3].

Products and Features

Weights & Biases offers a suite of interconnected products that cover the ML development lifecycle from experimentation through production monitoring.

Experiments (Experiment Tracking)

The flagship product, W&B Experiments, automatically captures everything needed to reproduce and analyze a training run. This includes:

Configuration parameters: learning rate, batch size, model architecture, optimizer settings, and any custom hyperparameters
Training metrics: loss, accuracy, and custom metrics logged over time, displayed as interactive line charts
System metrics: GPU utilization, memory usage, CPU load, and network I/O
Code and environment: the exact code version (via Git integration), Python environment, and hardware specifications
Media: images, audio, video, 3D objects, and custom HTML panels logged at any point during training

The web dashboard allows side-by-side comparison of runs, grouping by tags or configuration values, and filtering by any logged attribute. Practitioners can identify patterns across hundreds or thousands of experiments that would be invisible in manual tracking ^[9].

Integration requires minimal code changes. A typical PyTorch training loop needs only the addition of wandb.init(), wandb.config, and wandb.log():

import wandb

wandb.init(project="my-project")
wandb.config.learning_rate = 0.001
wandb.config.epochs = 10

for epoch in range(10):
    loss = train_one_epoch()
    wandb.log({"loss": loss, "epoch": epoch})

Sweeps (Hyperparameter Tuning)

W&B Sweeps automates hyperparameter optimization by coordinating distributed search across multiple agents. Supported search strategies include:

Strategy	Description
Grid Search	Exhaustive search over all parameter combinations
Random Search	Random sampling from specified distributions
Bayesian Optimization	Probabilistic model-guided search using Gaussian processes
Early Termination	Stops underperforming runs using the Hyperband algorithm

Sweeps are defined via a YAML configuration file that specifies the parameter space, search method, and optimization objective. The sweep controller distributes work across available compute resources, and results appear in the shared dashboard in real time ^[10].

Artifacts (Data and Model Versioning)

W&B Artifacts provides version control for datasets, models, and any other files produced or consumed during ML workflows. Each artifact is content-addressed (using checksums), meaning that duplicate data is stored only once even across many versions. Artifacts track lineage automatically: every artifact records which run produced it and which runs consumed it, creating a full provenance graph ^[11].

Common use cases include:

Versioning training, validation, and test datasets
Storing model checkpoints with links to the runs that produced them
Tracking preprocessing outputs and intermediate results
Creating reproducible data pipelines where each step's inputs and outputs are logged

Tables (Interactive Data Exploration)

W&B Tables allow users to log, query, and visualize structured data interactively. A table can contain almost any data type: numbers, text, images, audio clips, bounding boxes, segmentation masks, and more. Users can filter rows, sort by any column, group data, and create custom visualizations directly in the web interface ^[12].

Tables are particularly useful for model debugging. For instance, a computer vision practitioner can log a table of misclassified images, sort by confidence score, and visually inspect the failure modes without writing any additional code.

Reports (Collaborative Documentation)

W&B Reports combine interactive visualizations with narrative text in a WYSIWYG editor that supports Markdown, LaTeX, and code snippets. Reports pull live data from experiments and artifacts, so charts update automatically as new runs complete. Teams use Reports to document experiment findings, share progress updates, and create internal technical publications ^[13].

Launch (Compute Management)

W&B Launch provides a job queuing and execution system that lets teams submit training jobs to various compute backends (Kubernetes clusters, cloud VMs, or local machines) from a central interface. Launch jobs are configured with specific resource requirements and can reference W&B Artifacts for data and code, ensuring reproducibility across different execution environments.

Model Registry

The W&B Model Registry serves as a centralized catalog for trained models. Teams can promote models through stages (staging, production, archived), attach documentation and evaluation metrics, and trigger downstream workflows (such as deployment or further evaluation) when a model transitions between stages.

Weave (LLM Observability and Evaluation)

Weave is W&B's product line built for the large language model era. As organizations shifted from training custom models to building applications on top of foundation models, a new set of observability challenges emerged. W&B first announced W&B Weave in April 2024 as a lightweight developer toolkit, then declared it generally available for enterprises on December 2, 2024 at AWS re:Invent ^[14]^[15]. Weave addresses LLM-application challenges with tools for tracing, evaluation, and monitoring ^[16].

Tracing: Weave automatically instruments LLM API calls using a lightweight decorator (@weave.op). Every call to OpenAI, Anthropic, Google AI Studio, or any of 20+ supported providers is logged with full request/response payloads, token counts, latency measurements, and cost calculations. Nested calls (such as a retrieval-augmented generation pipeline that first searches a database and then prompts an LLM) are captured as structured traces ^[16].

Evaluation: Weave provides a framework for building evaluation pipelines using "scorers," which are functions that assess LLM outputs against expected results. Scorers can be rule-based (checking for specific keywords or format compliance), model-based (using an LLM judge to rate quality), or metric-based (computing BLEU, ROUGE, or custom scores). Weave Guardrails adds pre-built scorers for safety and quality, including toxicity, bias, PII detection, hallucination, coherence, fluency, and context relevance. Evaluations can be run on datasets stored as W&B Artifacts, and results are tracked over time to detect regressions ^[16].

Monitoring: In production settings, Weave provides dashboards for monitoring token usage, response latency, error rates, and cost trends. This allows teams to catch slow queries, identify prompt regressions, manage API spend, and monitor the behavior of autonomous agents.

Integrations

W&B maintains official integrations with a broad range of ML frameworks and tools.

Category	Integrations
Deep Learning Frameworks	PyTorch, TensorFlow / Keras, JAX, PyTorch Lightning, Fastai
NLP and LLMs	Hugging Face Transformers, LangChain, LlamaIndex, OpenAI API, Anthropic API
Specialized Libraries	scikit-learn, XGBoost, LightGBM, CatBoost, Ultralytics (YOLO)
Compute Platforms	AWS SageMaker, Google Vertex AI, Azure ML, Kubernetes
Data Tools	pandas, Spark, DVC

Most integrations work through callbacks or simple configuration flags. For example, Hugging Face Transformers users can enable W&B logging by setting report_to="wandb" in the TrainingArguments, with no other code changes required ^[17].

How does Weights & Biases compare with MLflow?

The most frequent comparison in the MLOps space is between Weights & Biases and MLflow, the open-source platform originally developed by Databricks.

Feature	Weights & Biases	MLflow
Hosting model	Managed cloud (SaaS), dedicated cloud, or self-managed	Self-hosted (open source) or Databricks managed
Setup complexity	Minimal; sign up and install the SDK	Requires server setup and maintenance for team use
UI and visualization	Polished interactive dashboards with rich media support	Functional but more limited visualization capabilities
Collaboration	Built-in team workspaces, Reports, and shared dashboards	More limited; primarily individual experiment tracking
Hyperparameter tuning	Native Sweeps with Bayesian optimization	Requires third-party integration (e.g., Optuna, Hyperopt)
LLM observability	Weave product for tracing, evaluation, and monitoring	MLflow Tracing (added more recently)
Language support	Python-focused	Language-agnostic (Python, R, Java, etc.)
Cost	Free tier with usage limits; paid plans for teams	Free (open source); Databricks managed version is paid
Auto-logging	Automatic capture via framework integrations	Built-in auto-logging for many frameworks
Model registry	Built-in with stage promotion and lineage	Built-in with model versioning and staging

W&B's strengths lie in its user experience, visualization quality, and managed hosting, which reduce the operational burden on ML teams. MLflow's advantages include its open-source nature, language flexibility, and tight integration with the Databricks ecosystem. Many organizations evaluate both and choose based on their specific needs: teams that want a turnkey solution with minimal setup tend to prefer W&B, while those committed to open-source tooling or the Databricks platform often lean toward MLflow ^[18].

Is Weights & Biases open source?

The core Weights & Biases experiment tracking platform is a commercial product, not open source: it is offered as a managed cloud service with a free tier for individuals and academics and paid plans for teams. The client-side wandb Python library that instruments training code is, however, openly available and one of the most downloaded ML tools on PyPI. W&B's Weave toolkit is published as an open-source project on GitHub, allowing developers to use its tracing and evaluation capabilities, and the company also maintains an extensive open examples repository ^[16]^[19].

Deployment Options

W&B offers three deployment configurations to accommodate different security and compliance requirements:

Option	Description
Multi-tenant Cloud	Shared infrastructure managed by W&B; fastest to set up
Dedicated Cloud	Isolated cloud instance managed by W&B; meets stricter compliance needs
Self-Managed	Customer hosts the platform on their own infrastructure (on-premises or private cloud)

All deployment options provide the same feature set, and data logged to any deployment can be exported or migrated. The self-managed option is popular with organizations in regulated industries (finance, healthcare, defense) that cannot send experiment data to external servers.

Use Cases and Adoption

Weights & Biases has been adopted across a wide range of industries and use cases.

AI Research Labs: OpenAI, NVIDIA, Cohere, and other leading labs have used W&B as their system of record for tracking model training experiments. The platform's ability to handle thousands of concurrent runs and large volumes of logged data makes it suitable for large-scale research ^[2]^[4].

Autonomous Vehicles: Companies like Toyota use W&B to track experiments related to perception models, sensor fusion, and planning algorithms, where reproducibility and rigorous evaluation are safety-critical.

Healthcare and Pharmaceuticals: AstraZeneca and other pharmaceutical companies use W&B for drug discovery workflows, tracking experiments across molecular property prediction, protein structure modeling, and clinical trial optimization ^[4].

Technology Companies: Samsung, Canva, Square, and Snowflake are among the enterprise customers that rely on W&B for their internal ML workflows.

Academic Research: Universities and research institutions use the free academic tier to track experiments for published papers, ensuring reproducibility and enabling collaboration between lab members.

Technical Architecture

The W&B platform consists of several components:

Python SDK (wandb): The client library, installable via pip install wandb, handles local logging, batching, and asynchronous upload of data to the W&B backend. The SDK is designed to add minimal overhead to training loops, with logging operations running in background threads.

Backend Service: The server component stores experiment metadata in a relational database, binary artifacts in object storage (S3-compatible), and serves the web application. In the self-managed deployment, customers run this infrastructure on their own Kubernetes clusters.

Web Application: The React-based frontend provides dashboards, run comparison, artifact browsing, report editing, and team management. The application supports real-time updates as new data arrives from active runs.

API: A GraphQL API and REST endpoints enable programmatic access to all platform data, supporting custom integrations, automated pipelines, and third-party tools.

Community and Ecosystem

W&B has built an active community around its platform. The company employs more than 270 people distributed globally and hosts educational content through W&B Courses (free online classes covering topics from basic experiment tracking to advanced LLM evaluation), a blog with technical articles and research summaries, and community events including the annual Fully Connected conference ^[2].

The wandb Python package is one of the most downloaded ML tools on PyPI, and the company maintains an extensive examples repository on GitHub with integration guides for dozens of frameworks and use cases ^[19].

W&B also invests in the open-source ecosystem. Weave is available as an open-source project on GitHub, allowing developers to use its tracing and evaluation capabilities without a W&B account ^[16].

Current State (2025-2026)

Following the CoreWeave acquisition in May 2025, Weights & Biases operates as a subsidiary of CoreWeave while maintaining its product identity and development roadmap. The acquisition has brought deeper integration with CoreWeave's GPU cloud infrastructure, enabling features like seamless job submission from W&B Launch to CoreWeave compute clusters.

The platform continues to evolve along two main axes. First, the traditional ML experiment tracking and model management products receive ongoing improvements in scalability, performance, and user experience. Second, Weave has become an increasingly important part of the product portfolio as the industry shifts toward LLM application development. Weave's tracing and evaluation capabilities are being expanded to support agentic AI workflows, multi-step reasoning chains, and complex orchestration patterns that characterize modern LLM applications, and W&B Weave was listed in the AWS Marketplace AI Agents and Tools category in 2025 ^[16].

W&B's position in the market remains strong. The platform's early adoption by high-profile AI labs, combined with its breadth of features and managed hosting model, has established it as one of the leading MLOps solutions. The CoreWeave acquisition adds infrastructure depth, positioning the combined entity to offer a more complete stack for AI development, from GPU compute to experiment tracking to production monitoring ^[3].

References

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit

What links here

Concept drift DeepSeek Hyperparameter Tuning Insight Partners LLaMA-Factory LlamaIndex MLflow NaN Trap NeRF PyTorch Lightning TensorBoard Training loss

What is Weights & Biases used for?

Who founded Weights & Biases, and when?

History

Founding and early days (2017-2019)

Growth and adoption (2020-2023)

How was Weights & Biases funded?

When did CoreWeave acquire Weights & Biases?

Products and Features

Experiments (Experiment Tracking)

Sweeps (Hyperparameter Tuning)

Artifacts (Data and Model Versioning)

Tables (Interactive Data Exploration)

Reports (Collaborative Documentation)

Launch (Compute Management)

Model Registry

Weave (LLM Observability and Evaluation)

Integrations

How does Weights & Biases compare with MLflow?

Is Weights & Biases open source?

Deployment Options

Use Cases and Adoption

Technical Architecture

Community and Ecosystem

Current State (2025-2026)

References

Improve this article

Related Articles

Claude Sonnet 4.5

DataFrame

Keras

Matplotlib

NumPy

Pandas

What links here

Related Articles

Claude Sonnet 4.5

DataFrame

Keras

Matplotlib

NumPy

Pandas

What links here