Scale AI

Scale AI is an American artificial intelligence data infrastructure company that provides data labeling, model evaluation, and AI platform services. Founded in 2016 by Alexandr Wang and Lucy Guo, Scale AI began as a data annotation service for autonomous vehicles and has since evolved into one of the most important behind-the-scenes players in the AI industry, providing the training data that powers large language models from companies including OpenAI, Meta, and Microsoft. Headquartered in San Francisco, California, Scale AI reached a valuation of $29 billion in 2025 following a landmark investment from Meta.

History and Founding

Alexandr Wang was born in January 1997 in Los Alamos, New Mexico, to Chinese immigrant parents who both worked as physicists at Los Alamos National Laboratory. Wang showed exceptional aptitude for mathematics and computer science from a young age, winning national programming competitions in high school. He was admitted to the Massachusetts Institute of Technology (MIT) to study mathematics and computer science, but before starting, he took a gap year and moved to Silicon Valley, where he landed a job as a software engineer at Quora.

During his freshman year at MIT, Wang co-founded Scale AI in 2016 with Lucy Guo. The core idea was to build a platform that could produce high-quality labeled data for machine learning models at scale. At the time, the primary customer base was the autonomous vehicle industry, which needed enormous volumes of labeled images, point clouds, and sensor data to train self-driving car systems. Wang dropped out of MIT to focus on the company full-time.

In 2021, at age 24, Wang became the world's youngest self-made billionaire, according to Forbes, which estimated his net worth at approximately $3.6 billion as of April 2025. He was previously named to the Forbes 30 Under 30 list in the Enterprise Technology category in both 2018 and 2021.

Pivot from Autonomous Vehicles to LLM Training Data

Scale AI's initial business focused heavily on data labeling for autonomous vehicles, with early customers including companies in the self-driving car space. The company built a workforce of human annotators (organized through its subsidiary Remotasks) who labeled images, drew bounding boxes around objects, and annotated sensor data from LiDAR and camera systems.

As the AI landscape shifted with the rise of large language models in 2020 and beyond, Scale AI recognized that the same fundamental capability, organizing human intelligence to produce high-quality training data, was needed even more urgently for language models. The company pivoted its business to focus increasingly on LLM training data, including:

Text annotation and classification for natural language processing tasks
Reinforcement learning from human feedback (RLHF) data, where human evaluators rank model outputs to train reward models
Red teaming and safety evaluation data for testing model robustness
Multi-modal data annotation combining text, images, and code

This pivot proved extremely lucrative. As companies like OpenAI, Anthropic, and Meta raced to train increasingly powerful language models, the demand for high-quality human feedback data surged. Scale AI's existing infrastructure for managing large distributed workforces of annotators positioned it perfectly to meet this demand.

Products and Services

As of 2025, Scale AI organizes its product portfolio into three main categories: Build AI, Apply AI, and Evaluate AI ^[9].

Scale Data Engine (Build AI)

The Scale Data Engine is the company's core product for AI training data. It encompasses the full lifecycle of data preparation for model development:

Component	Function
Data collection	Gathering raw data from diverse sources
Data curation	Selecting and organizing training examples
Data annotation	Human labeling across modalities (text, image, video, audio)
RLHF services	Human evaluators ranking model outputs for reward model training
Model evaluation	Testing model performance against benchmarks and edge cases
Safety and alignment	Red teaming and safety testing for deployed models

The Data Engine supports a wide range of AI modalities, from traditional computer vision tasks (bounding boxes, segmentation, keypoint detection) to complex language tasks (instruction following, code generation, creative writing evaluation).

Scale GenAI Platform (Apply AI)

The Scale GenAI Platform (also called the Generative AI Data Engine) is specifically designed for companies building and deploying generative AI applications. It provides tools for fine-tuning models on custom data, evaluating model performance, and managing the data pipelines that feed into generative AI systems. The platform targets enterprise customers who want to build custom AI applications on top of foundation models.

Scale Donovan

Scale Donovan is a decision-support platform that applies generative AI to unstructured data and geospatial feeds to surface recommendations at "mission speed." Originally developed for defense and intelligence applications, Donovan integrates LLM-based tools including geospatial chat, text-to-API, and retrieval-augmented generation to enable public sector teams to access verified intelligence quickly ^[9].

Donovan's capabilities include:

Capability	Description
Geospatial analysis	AI-powered analysis of satellite imagery and geographic data
Document summarization	Automated summarization of intelligence reports and lengthy documents
Report generation	Templated report creation using AI synthesis
Document translation	Multi-language translation of uploaded documents
Data integration	Aggregation of disparate data sources into unified intelligence picture

The platform runs on secure government servers and can quickly assemble relevant operations or intelligence data for decision-making in contexts ranging from battlefield planning to intelligence gathering. Donovan has been deployed with the U.S. Department of Defense and forms a central part of Scale AI's public sector business.

SEAL (Safety, Evaluation, and Alignment Lab)

SEAL is Scale AI's research division focused on evaluating and aligning large language models. Launched in 2023, SEAL produces benchmarks, leaderboards, and evaluation frameworks that help the AI community assess model capabilities and safety properties. The SEAL Leaderboards provide independent rankings of LLM performance across various tasks, offering a neutral third-party evaluation that has become an important reference point for the industry.

In March 2026, the company expanded SEAL into Scale Labs, a broader research division that builds on SEAL's evaluation work while expanding into new areas of AI safety and alignment research.

SEAL Benchmarks and Leaderboards

SEAL has produced several notable evaluation frameworks and benchmark suites ^[10]:

Benchmark/Product	Focus Area
SEAL Leaderboards	Public rankings of LLM performance across diverse tasks
SEAL Showdown	Head-to-head model comparisons based on human evaluation
SWE-Atlas	Evaluation framework for software engineering capabilities
Voice Showdown	Benchmark for voice and speech AI systems
PropensityBench	Evaluation of model tendencies and behavioral patterns
2025 Model of the Year Awards	Annual recognition of top-performing AI models

SEAL's evaluation work serves a dual purpose. Publicly, it provides the AI community with independent model assessments. Commercially, it positions Scale AI as the authoritative evaluator of model quality, which reinforces its value to customers who need to select and validate models for production deployment.

RLHF Data Services Deep Dive

One of Scale AI's most strategically important offerings is its reinforcement learning from human feedback (RLHF) data services. RLHF has become the standard technique for aligning large language models with human preferences and instructions. The process requires large volumes of human-generated comparison data, where evaluators read model outputs and rank them by quality, helpfulness, accuracy, and safety.

Scale AI provides this data at scale, employing trained evaluators who specialize in different domains (coding, mathematics, creative writing, factual knowledge) to produce the nuanced judgments that RLHF requires. The company's RLHF data has been used by several of the leading AI labs to train their flagship models.

The quality of RLHF data directly impacts the quality of the resulting model, making Scale AI's role in the AI supply chain critical. Poor or inconsistent human feedback can lead to models that are unreliable, biased, or misaligned, which is why Scale AI invests heavily in annotator training, quality assurance, and evaluation methodology.

RLHF Methodology

Scale AI's approach to RLHF data production follows a structured methodology that prioritizes quality at volume ^[11]:

Phase	Description
Task design	Define evaluation criteria, rubrics, and scoring guidelines for specific model behaviors
Annotator selection	Match tasks to evaluators with relevant domain expertise (coding, math, science, creative writing)
AI pre-labeling	Use AI models to generate initial labels, reducing human effort on straightforward cases
Human annotation	Trained evaluators rank, compare, or score model outputs based on defined criteria
Quality control	Multi-layer QC including inter-annotator agreement checks, gold standard tasks, and statistical auditing
Grading and leveling	Workers are scored on accuracy; high performers advance to more complex tasks, low performers are removed
Delivery	Processed data is delivered in formats ready for reward model training

This methodology, which Scale describes as "crowd + AI + rigorous QC," combines the scale of crowdsourced labor with the quality controls of a managed service. Workers on the Remotasks and Outlier platforms are graded continuously, with top performers receiving access to higher-paying, more complex tasks and underperformers being phased out ^[11].

Domain-Specific RLHF

Scale AI has invested in building specialized annotator pools for domains that require technical expertise:

Domain	Annotator Requirements	Typical Tasks
Software engineering	Professional developers, CS degree holders	Code review, debugging evaluation, algorithm comparison
Mathematics	Advanced math education or research background	Proof verification, solution ranking, error detection
Scientific reasoning	Graduate-level science education	Factual accuracy assessment, methodology evaluation
Creative writing	Professional writers, editors	Style evaluation, coherence scoring, originality assessment
Multilingual	Native speakers with professional fluency	Translation quality, cultural appropriateness, idiomatic accuracy
Legal and compliance	Legal professionals, compliance officers	Regulatory accuracy, policy adherence evaluation

Workforce Operations

Scale AI manages its annotation workforce through two primary subsidiaries:

Subsidiary	Focus	Workforce Profile
Remotasks	Computer vision and autonomous vehicle data annotation	Large global workforce; task-based compensation
Outlier	Data annotation for LLMs and generative AI	Professionals with advanced degrees, industry expertise, native fluency in target languages

These subsidiaries coordinate a global network of over 200,000 trained annotators who perform the detailed labeling work that underpins Scale AI's products ^[11]. The workforce model has drawn both praise for creating economic opportunities in developing countries and criticism regarding working conditions and pay rates.

Outlier, which focuses on the higher-complexity LLM training data, recruits contributors with more specialized qualifications. Typical Outlier tasks involve content evaluation, RLHF ranking, and domain-specific assessment work that requires subject matter expertise rather than just general labeling skills. The distinction between Remotasks (volume-oriented, task-based) and Outlier (quality-oriented, expertise-based) reflects the different requirements of computer vision versus language model training data.

Government and Defense Contracts

Scale AI has built a significant government business, particularly with the U.S. Department of Defense:

Date	Contract/Partnership
2020	Initial contract with the U.S. Department of Defense
May 2021	Michael Kratsios (former U.S. CTO under Trump) joins as Managing Director and Head of Strategy
February 2025	Five-year partnership with the Qatari government for AI-powered government services
February 2025	Selected as third-party evaluator of AI models for the U.S. AI Safety Institute
March 2025	Deal with the Department of Defense for the Thunderforge project
2025	Defense Information Systems Agency (DISA) selects Scale for Project ASCEND
2025	Contract with Defense Logistics Agency (DLA) for supply chain optimization

The Thunderforge project aims to use AI to plan and help execute movements of ships, planes, and other military assets, representing a significant expansion of AI applications in defense logistics and operations. The hiring of Michael Kratsios, who had served as Chief Technology Officer of the United States during the Trump administration, signaled Scale AI's serious commitment to government work.

Project ASCEND, awarded by the Defense Information Systems Agency, focuses on deploying generative AI solutions for Defensive Cyber Operations (DCO). The DLA contract uses Scale Donovan to streamline operations and create real-time data insights for the U.S. military's supply chain ^[10].

Scale AI's defense business has positioned the company at the intersection of AI and national security, a space that carries both lucrative contract opportunities and complex ethical considerations.

Funding and Valuation

Scale AI has raised substantial capital across multiple funding rounds:

Round	Date	Amount	Valuation	Lead Investors
Seed	2016	$4.5M	-	Y Combinator, Founders Fund
Series A	2017	$7.4M	-	Accel Partners
Series B	2019	$22.5M	-	Index Ventures
Series C	2020	$100M	~$3.5B	Founders Fund, Tiger Global
Series D	2021	$325M	$7.3B	Tiger Global
Series E	2023	$250M	$7.3B	Various
Series F	2024	$1B+	$14B	Various
Meta Investment	June 2025	$14.3B	$29B	Meta Platforms

The Meta investment in June 2025 was transformative. Meta acquired a 49% stake in Scale AI for $14.3 billion, more than doubling the company's valuation from $14 billion to $29 billion. The deal reflected Meta's strategy of securing access to high-quality AI training data infrastructure.

The investment established a "privileged access" framework for Meta, including priority scheduling for Scale's global workforce and exclusive rights to Scale's SEAL evaluation frameworks for high-stakes reasoning models ^[12].

Revenue and Financial Performance

Scale AI generated approximately $870 million in revenue in 2024 and projected over $2 billion in revenue for 2025. The company's revenue growth has been driven by the explosive demand for LLM training data, with customers willing to pay premium prices for high-quality, domain-specific human feedback data.

Leadership Transition and Meta

In June 2025, alongside Meta's $14.3 billion investment, Alexandr Wang departed Scale AI to join Meta as Chief AI Officer, leading Meta Superintelligence Labs. Jason Droege, Scale AI's Chief Strategy Officer, assumed the role of Interim CEO. The transition marked a significant moment for both companies: Scale AI's founder moving to lead superintelligence research at one of the world's largest technology companies, while Scale AI continued to operate independently (with Meta as a major shareholder).

Competition

Scale AI competes with several companies in the data labeling and AI services space. The global data labeling market is projected to expand from approximately $4.87 billion in 2025 to more than $29 billion by 2032, creating intense competition among providers ^[13].

Competitor	Primary Focus	Key Differentiator vs. Scale AI
Labelbox	Data labeling platform	Platform-first approach; customers bring their own labelers; usage-based billing tied to Labelbox Units (LBUs)
Appen	Data annotation services	Pivoting toward "Sovereign AI" solutions for national governments; ~$231M trailing revenue (late 2025)
Surge AI	AI training data	Smaller scale, focused on high-quality RLHF data
Amazon SageMaker Ground Truth	AWS data labeling	Integrated with AWS ecosystem; programmatic labeling
Snorkel AI	Programmatic data labeling	Software-defined labeling; reduces need for human annotators
Sama	Data annotation services	Ethical sourcing focus; impact-driven workforce model

Scale AI differentiates through its combination of scale, quality, and breadth of services. While competitors may excel in specific areas (such as programmatic labeling or specific data types), Scale AI offers the broadest end-to-end platform, from initial data collection through RLHF and model evaluation. Its relationships with the leading AI labs and the U.S. government provide significant competitive moats.

Competitive Dynamics Post-Meta Investment

The Meta investment reshaped the competitive landscape in several ways ^[13]:

Vendor diversification: Many ML teams have begun diversifying their data labeling vendors to reduce dependency on Scale AI, particularly given Meta's 49% ownership stake and the potential for conflicts of interest
Labelbox positioning: Labelbox has positioned itself as the platform of choice for teams wanting direct control over RLHF and evaluation workflows, without relying on a vendor-managed labor force
Appen's sovereign AI pivot: Traditional data providers like Appen have pivoted toward government AI solutions, filling gaps left by Scale's increasing focus on Meta's needs
Workforce concerns: With Scale's founder at Meta, reports of major client departures, and a 14% workforce layoff, some customers have questioned the company's focus and availability

Despite these dynamics, Scale AI's scale of operations, quality infrastructure, and government relationships continue to provide significant advantages that competitors have not replicated.

Current State

As of early 2026, Scale AI is one of the most valuable private AI companies in the world at a $29 billion valuation. Under interim CEO Jason Droege, the company continues to expand its product offerings, with the launch of Scale Labs broadening its research capabilities. The government business continues to grow, with the Thunderforge defense project, Project ASCEND, and AI Safety Institute evaluation role representing high-profile contracts. With Meta as a major shareholder and customer, and continued demand for high-quality training data from the broader AI industry, Scale AI remains a critical piece of the AI infrastructure ecosystem.

History and Founding

Pivot from Autonomous Vehicles to LLM Training Data

Products and Services

Scale Data Engine (Build AI)

Scale GenAI Platform (Apply AI)

Scale Donovan

SEAL (Safety, Evaluation, and Alignment Lab)

SEAL Benchmarks and Leaderboards

RLHF Data Services Deep Dive

RLHF Methodology

Domain-Specific RLHF

Workforce Operations

Government and Defense Contracts

Funding and Valuation

Revenue and Financial Performance

Leadership Transition and Meta

Competition

Competitive Dynamics Post-Meta Investment

Current State

References

Improve this article

Related Articles

Machine learning terms/Fairness

Perplexity AI

Groq

Databricks

Character.AI

ElevenLabs

History and Founding

Pivot from Autonomous Vehicles to LLM Training Data

Products and Services

Scale Data Engine (Build AI)

Scale GenAI Platform (Apply AI)

Scale Donovan

SEAL (Safety, Evaluation, and Alignment Lab)

SEAL Benchmarks and Leaderboards

RLHF Data Services Deep Dive

RLHF Methodology

Domain-Specific RLHF

Workforce Operations

Government and Defense Contracts

Funding and Valuation

Revenue and Financial Performance

Leadership Transition and Meta

Competition

Competitive Dynamics Post-Meta Investment

Current State

References

Related Articles

Machine learning terms/Fairness

Perplexity AI

Groq

Databricks

Character.AI

ElevenLabs