Weights & Biases (commonly abbreviated as W&B or wandb) is a machine learning experiment tracking and observability platform used by ML engineers, data scientists, and researchers to log, visualize, and reproduce their model training workflows. Founded in 2017 by Lukas Biewald, Chris Van Pelt, and Shawn Lewis, the company grew from a small developer tool into one of the most widely adopted MLOps platforms in the industry, serving over 200,000 practitioners and more than 1,400 enterprise organizations as of 2025 [1].
W&B's core value proposition is straightforward: it acts as a system of record for machine learning experiments. Every hyperparameter configuration, training metric, model artifact, and dataset version is logged automatically, making it possible to compare runs, debug regressions, and share findings across teams. The platform has been used by leading AI research organizations including OpenAI, Meta, NVIDIA, and others during the training of large-scale models [2].
In March 2025, cloud infrastructure provider CoreWeave announced its acquisition of Weights & Biases for a reported $1.7 billion, a transaction that closed on May 5, 2025 [3].
Weights & Biases was co-founded in 2017 in San Francisco by three experienced technologists. Lukas Biewald, the CEO, had previously founded CrowdFlower (later renamed Figure Eight), a data labeling platform that was acquired by Appen in 2019. Chris Van Pelt, the CTO, brought deep expertise in developer tooling. Shawn Lewis, the Chief Scientist, contributed research experience in machine learning infrastructure [4].
The founding insight was born from frustration. Machine learning practitioners were tracking experiments in spreadsheets, naming model checkpoints with cryptic suffixes, and losing track of which hyperparameters produced which results. Version control systems like Git handled code well, but they were not designed to track the evolving state of model weights, training metrics, and large datasets. Biewald and his co-founders set out to build a purpose-built tool for this problem.
The initial product focused on experiment tracking: a lightweight Python SDK that could be integrated into any training script with just a few lines of code. Users called wandb.init() to start a run, wandb.log() to record metrics, and wandb.config to store hyperparameters. The simplicity of this interface was a deliberate choice, and it became a defining characteristic of the platform.
W&B's adoption accelerated dramatically during the deep learning boom of the early 2020s. As organizations scaled up their model training, the need for systematic experiment management became acute. The platform gained particular traction in the research community, where reproducibility and collaboration are paramount.
OpenAI became one of W&B's most prominent users, relying on the platform to track experiments during the development of its GPT series of large language models. Meta's AI research division (FAIR) and various teams at NVIDIA also adopted the platform [2]. This association with cutting-edge AI research lent W&B significant credibility and drove adoption among smaller organizations that aspired to follow similar practices.
During this period, the company expanded its product suite well beyond basic experiment tracking, adding hyperparameter tuning (Sweeps), data versioning (Artifacts), collaborative reporting (Reports), interactive data exploration (Tables), compute management (Launch), and a model registry.
Weights & Biases raised a total of approximately $250 million across multiple funding rounds [5].
| Round | Date | Amount | Key Investors |
|---|---|---|---|
| Seed | 2018 | Undisclosed | Trinity Ventures |
| Series A | 2020 | $15M | Insight Partners |
| Series B | 2021 | $135M | Insight Partners, Felicis Ventures |
| Series C | August 2023 | $50M | Various investors |
| Total | - | ~$250M | - |
The Series B round valued the company at approximately $1.25 billion, granting it unicorn status. The continued investment reflected strong market confidence in the MLOps category and W&B's position within it [5].
On March 4, 2025, CoreWeave (Nasdaq: CRWV), a cloud infrastructure company specializing in GPU computing, announced an agreement to acquire Weights & Biases. The deal closed on May 5, 2025, at a reported price of $1.7 billion [3].
The acquisition was strategic for CoreWeave, which sought to complement its infrastructure offerings with a software platform for AI model development. By combining CoreWeave's GPU cloud with W&B's developer tools, the merged entity aimed to offer a vertically integrated platform spanning compute, training, evaluation, and monitoring. CoreWeave committed to maintaining W&B's interoperability, ensuring customers could continue to use any infrastructure provider, foundation model, or framework [3].
Weights & Biases offers a suite of interconnected products that cover the ML development lifecycle from experimentation through production monitoring.
The flagship product, W&B Experiments, automatically captures everything needed to reproduce and analyze a training run. This includes:
The web dashboard allows side-by-side comparison of runs, grouping by tags or configuration values, and filtering by any logged attribute. Practitioners can identify patterns across hundreds or thousands of experiments that would be invisible in manual tracking [6].
Integration requires minimal code changes. A typical PyTorch training loop needs only the addition of wandb.init(), wandb.config, and wandb.log():
import wandb
wandb.init(project="my-project")
wandb.config.learning_rate = 0.001
wandb.config.epochs = 10
for epoch in range(10):
loss = train_one_epoch()
wandb.log({"loss": loss, "epoch": epoch})
W&B Sweeps automates hyperparameter optimization by coordinating distributed search across multiple agents. Supported search strategies include:
| Strategy | Description | |---|---|---| | Grid Search | Exhaustive search over all parameter combinations | | Random Search | Random sampling from specified distributions | | Bayesian Optimization | Probabilistic model-guided search using Gaussian processes | | Early Termination | Stops underperforming runs using the Hyperband algorithm |
Sweeps are defined via a YAML configuration file that specifies the parameter space, search method, and optimization objective. The sweep controller distributes work across available compute resources, and results appear in the shared dashboard in real time [7].
W&B Artifacts provides version control for datasets, models, and any other files produced or consumed during ML workflows. Each artifact is content-addressed (using checksums), meaning that duplicate data is stored only once even across many versions. Artifacts track lineage automatically: every artifact records which run produced it and which runs consumed it, creating a full provenance graph [8].
Common use cases include:
W&B Tables allow users to log, query, and visualize structured data interactively. A table can contain almost any data type: numbers, text, images, audio clips, bounding boxes, segmentation masks, and more. Users can filter rows, sort by any column, group data, and create custom visualizations directly in the web interface [9].
Tables are particularly useful for model debugging. For instance, a computer vision practitioner can log a table of misclassified images, sort by confidence score, and visually inspect the failure modes without writing any additional code.
W&B Reports combine interactive visualizations with narrative text in a WYSIWYG editor that supports Markdown, LaTeX, and code snippets. Reports pull live data from experiments and artifacts, so charts update automatically as new runs complete. Teams use Reports to document experiment findings, share progress updates, and create internal technical publications [10].
W&B Launch provides a job queuing and execution system that lets teams submit training jobs to various compute backends (Kubernetes clusters, cloud VMs, or local machines) from a central interface. Launch jobs are configured with specific resource requirements and can reference W&B Artifacts for data and code, ensuring reproducibility across different execution environments.
The W&B Model Registry serves as a centralized catalog for trained models. Teams can promote models through stages (staging, production, archived), attach documentation and evaluation metrics, and trigger downstream workflows (such as deployment or further evaluation) when a model transitions between stages.
Weave is W&B's newest product line, purpose-built for the large language model era. As organizations shifted from training custom models to building applications on top of foundation models, a new set of observability challenges emerged. Weave addresses these challenges with tools for tracing, evaluation, and monitoring of LLM-powered applications [11].
Tracing: Weave automatically instruments LLM API calls using a lightweight decorator (@weave.op). Every call to OpenAI, Anthropic, or any of 20+ supported providers is logged with full request/response payloads, token counts, latency measurements, and cost calculations. Nested calls (such as a retrieval-augmented generation pipeline that first searches a database and then prompts an LLM) are captured as structured traces [11].
Evaluation: Weave provides a framework for building evaluation pipelines using "scorers," which are functions that assess LLM outputs against expected results. Scorers can be rule-based (checking for specific keywords or format compliance), model-based (using an LLM judge to rate quality), or metric-based (computing BLEU, ROUGE, or custom scores). Evaluations can be run on datasets stored as W&B Artifacts, and results are tracked over time to detect regressions.
Monitoring: In production settings, Weave provides dashboards for monitoring token usage, response latency, error rates, and cost trends. This allows teams to catch slow queries, identify prompt regressions, and manage API spend.
W&B maintains official integrations with a broad range of ML frameworks and tools.
| Category | Integrations | |---|---|---| | Deep Learning Frameworks | PyTorch, TensorFlow / Keras, JAX, PyTorch Lightning, Fastai | | NLP and LLMs | Hugging Face Transformers, LangChain, LlamaIndex, OpenAI API, Anthropic API | | Specialized Libraries | scikit-learn, XGBoost, LightGBM, CatBoost, Ultralytics (YOLO) | | Compute Platforms | AWS SageMaker, Google Vertex AI, Azure ML, Kubernetes | | Data Tools | pandas, Spark, DVC |
Most integrations work through callbacks or simple configuration flags. For example, Hugging Face Transformers users can enable W&B logging by setting report_to="wandb" in the TrainingArguments, with no other code changes required [12].
The most frequent comparison in the MLOps space is between Weights & Biases and MLflow, the open-source platform originally developed by Databricks.
| Feature | Weights & Biases | MLflow |
|---|---|---|
| Hosting model | Managed cloud (SaaS), dedicated cloud, or self-managed | Self-hosted (open source) or Databricks managed |
| Setup complexity | Minimal; sign up and install the SDK | Requires server setup and maintenance for team use |
| UI and visualization | Polished interactive dashboards with rich media support | Functional but more limited visualization capabilities |
| Collaboration | Built-in team workspaces, Reports, and shared dashboards | More limited; primarily individual experiment tracking |
| Hyperparameter tuning | Native Sweeps with Bayesian optimization | Requires third-party integration (e.g., Optuna, Hyperopt) |
| LLM observability | Weave product for tracing, evaluation, and monitoring | MLflow Tracing (added more recently) |
| Language support | Python-focused | Language-agnostic (Python, R, Java, etc.) |
| Cost | Free tier with usage limits; paid plans for teams | Free (open source); Databricks managed version is paid |
| Auto-logging | Automatic capture via framework integrations | Built-in auto-logging for many frameworks |
| Model registry | Built-in with stage promotion and lineage | Built-in with model versioning and staging |
W&B's strengths lie in its user experience, visualization quality, and managed hosting, which reduce the operational burden on ML teams. MLflow's advantages include its open-source nature, language flexibility, and tight integration with the Databricks ecosystem. Many organizations evaluate both and choose based on their specific needs: teams that want a turnkey solution with minimal setup tend to prefer W&B, while those committed to open-source tooling or the Databricks platform often lean toward MLflow [13].
W&B offers three deployment configurations to accommodate different security and compliance requirements:
| Option | Description | |---|---|---| | Multi-tenant Cloud | Shared infrastructure managed by W&B; fastest to set up | | Dedicated Cloud | Isolated cloud instance managed by W&B; meets stricter compliance needs | | Self-Managed | Customer hosts the platform on their own infrastructure (on-premises or private cloud) |
All deployment options provide the same feature set, and data logged to any deployment can be exported or migrated. The self-managed option is popular with organizations in regulated industries (finance, healthcare, defense) that cannot send experiment data to external servers.
Weights & Biases has been adopted across a wide range of industries and use cases.
AI Research Labs: OpenAI, NVIDIA, and other leading labs have used W&B as their system of record for tracking model training experiments. The platform's ability to handle thousands of concurrent runs and petabytes of logged data makes it suitable for large-scale research [2].
Autonomous Vehicles: Companies like Toyota use W&B to track experiments related to perception models, sensor fusion, and planning algorithms, where reproducibility and rigorous evaluation are safety-critical.
Healthcare and Pharmaceuticals: AstraZeneca and other pharmaceutical companies use W&B for drug discovery workflows, tracking experiments across molecular property prediction, protein structure modeling, and clinical trial optimization.
Technology Companies: Samsung, Canva, Square, and Snowflake are among the enterprise customers that rely on W&B for their internal ML workflows [3].
Academic Research: Universities and research institutions use the free academic tier to track experiments for published papers, ensuring reproducibility and enabling collaboration between lab members.
The W&B platform consists of several components:
Python SDK (wandb): The client library, installable via pip install wandb, handles local logging, batching, and asynchronous upload of data to the W&B backend. The SDK is designed to add minimal overhead to training loops, with logging operations running in background threads.
Backend Service: The server component stores experiment metadata in a relational database, binary artifacts in object storage (S3-compatible), and serves the web application. In the self-managed deployment, customers run this infrastructure on their own Kubernetes clusters.
Web Application: The React-based frontend provides dashboards, run comparison, artifact browsing, report editing, and team management. The application supports real-time updates as new data arrives from active runs.
API: A GraphQL API and REST endpoints enable programmatic access to all platform data, supporting custom integrations, automated pipelines, and third-party tools.
W&B has built an active community around its platform. The company hosts educational content through W&B Courses (free online classes covering topics from basic experiment tracking to advanced LLM evaluation), a blog with technical articles and research summaries, and community events including the annual Fully Connected conference.
The wandb Python package is one of the most downloaded ML tools on PyPI, and the company maintains an extensive examples repository on GitHub with integration guides for dozens of frameworks and use cases [14].
W&B also invests in the open-source ecosystem. Weave is available as an open-source project on GitHub, allowing developers to use its tracing and evaluation capabilities without a W&B account [11].
Following the CoreWeave acquisition in May 2025, Weights & Biases operates as a subsidiary of CoreWeave while maintaining its product identity and development roadmap. The acquisition has brought deeper integration with CoreWeave's GPU cloud infrastructure, enabling features like seamless job submission from W&B Launch to CoreWeave compute clusters.
The platform continues to evolve along two main axes. First, the traditional ML experiment tracking and model management products receive ongoing improvements in scalability, performance, and user experience. Second, Weave has become an increasingly important part of the product portfolio as the industry shifts toward LLM application development. Weave's tracing and evaluation capabilities are being expanded to support agentic AI workflows, multi-step reasoning chains, and complex orchestration patterns that characterize modern LLM applications [11].
W&B's position in the market remains strong. The platform's early adoption by high-profile AI labs, combined with its breadth of features and managed hosting model, has established it as one of the leading MLOps solutions. The CoreWeave acquisition adds infrastructure depth, positioning the combined entity to offer a more complete stack for AI development, from GPU compute to experiment tracking to production monitoring [3].