Modal is a serverless cloud computing platform designed for running compute-intensive applications in artificial intelligence, machine learning, and data processing. The platform allows developers to write standard Python code and execute it in the cloud with automatic containerization, scaling, and GPU provisioning, eliminating the need to manage servers, Kubernetes, or Docker configurations. Modal is developed by Modal Labs, Inc., headquartered in New York City with offices in San Francisco and Stockholm.
Founded in January 2021 by Erik Bernhardsson and Akshat Bubna, Modal has raised over $110 million in venture capital funding and reached a $1.1 billion valuation (unicorn status) with its September 2025 Series B round. As of February 2026, the company was reported to be in talks for a new funding round at a valuation of approximately $2.5 billion. The platform serves thousands of customers including Meta, Scale, Substack, Ramp, and Lovable, and generates approximately $50 million in annualized revenue.
Modal was founded in January 2021 by Erik Bernhardsson, who serves as CEO. Akshat Bubna joined as co-founder and CTO in August 2021. The company is legally incorporated as Modal Labs, Inc.
Bernhardsson is a Swedish software engineer who spent seven years at Spotify (2008 to 2015), where he built the core of the music recommendation system and led a team of 20 engineers. During his time at Spotify, he created two widely used open-source projects: Luigi, a Python workflow scheduler for building complex pipelines of batch jobs (over 17,000 stars on GitHub), and Annoy (Approximate Nearest Neighbors Oh Yeah), a C++/Python library for fast approximate nearest neighbor search in high-dimensional spaces. Annoy remains in use at Spotify to power billions of music recommendations.
After Spotify, Bernhardsson served as CTO of Better.com from 2015 to 2020, where he scaled the engineering team from 1 to 300 people and built AI/ML systems for mortgage processing. He began developing Modal during the COVID-19 pandemic, motivated by his repeated frustration with the gap between local development and cloud execution in data and ML workflows.
Bubna won a gold medal at the International Olympiad in Informatics (IOI) in 2014, becoming the first Indian student to achieve this distinction. Before joining Modal, he worked as an early engineer at Scale AI. Bubna leads Modal's engineering team, which includes multiple International Olympiad in Informatics gold medalists.
Bernhardsson's core insight was that data and ML teams have fundamentally different infrastructure needs from traditional backend engineering teams. They require frequent infrastructure changes, bursty compute demands, and access to specialized hardware like GPUs and high-memory systems. Existing tools forced a slow cycle of building containers, pushing them to a registry, triggering jobs, and downloading logs, which destroyed the fast feedback loop that productive development requires.
Rather than building on top of existing systems like Docker and Kubernetes, Bernhardsson decided to build Modal's infrastructure from scratch. The first two years of development focused on foundational components: a custom container runtime, a proprietary filesystem, a scheduler, and an image builder, all written in Rust for performance. The company operated in a closed beta during this period.
Modal's first significant traction came when Stable Diffusion launched in August 2022, as developers needed a fast way to run GPU-intensive image generation workloads without managing infrastructure. This validated the platform's serverless GPU model and attracted its initial wave of users.
Modal has raised over $110 million in total venture capital funding across three rounds, reaching unicorn status in September 2025.
| Round | Date | Amount | Lead Investor | Total Raised | Valuation |
|---|---|---|---|---|---|
| Seed | Early 2022 | $7 million | Amplify Partners | $7 million | Not disclosed |
| Series A | October 2023 | $16 million | Redpoint Ventures | $23 million | ~$154 million (post-money) |
| Series B | September 2025 | $87 million | Lux Capital | ~$111 million | $1.1 billion (post-money) |
Other investors across these rounds include Definition Capital, Creandum, Essence VC, and angel investors Elad Gil and Neha Narkhede. The Series A coincided with Modal's general availability launch in October 2023.
In February 2026, TechCrunch reported that Modal was in talks with General Catalyst to lead a new funding round at a valuation of approximately $2.5 billion, which would more than double the company's valuation from less than five months prior. CEO Erik Bernhardsson characterized the discussions as general conversations rather than active fundraising. At the time, Modal's annualized revenue run rate was reported to be approximately $50 million.
The company had approximately 14 employees at the time of its Series A in late 2023, growing to around 79 employees by 2025.
Modal's platform is built on custom infrastructure written primarily in Rust, designed from the ground up to minimize container startup times and maximize GPU utilization. The architecture differs substantially from standard container orchestration platforms in several ways.
Custom Container Runtime: Modal uses Google's gVisor as its container runtime instead of standard Docker containers. gVisor is a sandboxed container runtime that intercepts application system calls and acts as a guest kernel, providing stronger isolation than traditional containers that share the host system's kernel. Modal was the first company to run gVisor with GPUs at scale; at the time, GPU support in gVisor was highly experimental, and Modal's engineering team has contributed upstream improvements to make gVisor GPU support production-ready.
FUSE-Based Filesystem: Modal built a custom filesystem using FUSE (Filesystem in Userspace) with content-addressed storage. Files are hashed and stored by their hash value, which eliminates duplication across container images. When a container starts, it does not need to pull an entire image; instead, it lazily fetches only the files that are actually accessed. Modal's data shows that applications typically access only a small fraction of their container image contents. An in-memory index tracks file hashes, and an aggressive local SSD cache (with approximately 100 microsecond latency) stores frequently used files, compared to approximately 2 millisecond latency for network storage. The filesystem was initially prototyped in Python, then rewritten in Rust for production performance.
VolumeFS: A second custom filesystem called VolumeFS provides persistent storage for Modal functions across any region. It uses eventual consistency rather than strict immediate consistency, which enables scalable distributed training and large dataset operations without sacrificing throughput.
Automated Resource Solver: Modal implements an automated GPU resource allocation system using Google's OR-Tools Linear Optimization Package (GLOP). This solver runs every few seconds, analyzing all active workloads and available GPU capacity across AWS, Google Cloud Platform, and Oracle Cloud Infrastructure. It factors in regional constraints, pricing, and capacity to perform both comprehensive and incremental optimization passes, allowing the platform to elastically scale to hundreds of GPUs without users managing any cluster infrastructure.
Modal's stated goal is sub-second container startup. In practice, cold start times for GPU-enabled containers typically range from 1 to 4 seconds, which is significantly faster than the minutes-long startup times common with Docker-based deployments. The platform achieves this through several techniques:
Modal reports that its infrastructure can build a 100 GB container image and then boot up 100 instances of that container within seconds.
Modal's primary interface is a Python SDK that uses decorators to define cloud infrastructure. Developers annotate their Python functions with Modal decorators, and the platform handles containerization, environment setup, and remote execution automatically. There are no YAML files, Dockerfiles, or Kubernetes manifests required.
A basic Modal function looks like this:
import modal
app = modal.App("example")
@app.function()
def square(x):
return x ** 2
Environment configuration is expressed in Python rather than configuration files:
@app.function(
image=modal.Image.debian_slim().pip_install(["numpy", "pandas"]),
gpu="A100",
timeout=300,
)
def train_model(data):
# This runs on an A100 GPU in the cloud
...
Key SDK features include:
| Feature | Description |
|---|---|
@app.function() | Converts a Python function to a cloud-executed function with configurable resources |
image= | Defines the container environment using Python code (pip packages, apt packages, custom Docker images) |
gpu= | Specifies GPU type and count (e.g., "A100", "H100:2", modal.gpu.A100(count=4)) |
schedule= | Sets up cron-like scheduled execution (e.g., modal.Period(hours=1)) |
.map() | Distributes work across multiple containers in parallel |
@app.cls() | Defines a class with lifecycle hooks and shared state across function calls |
@modal.web_endpoint() | Deploys a function as an HTTPS endpoint |
modal.Volume() | Attaches persistent storage to functions |
modal.Secret() | Securely manages and injects environment variables and credentials |
As of late 2025, Modal has also released alpha SDKs for JavaScript/TypeScript and Go, signaling plans for broader language support beyond Python.
Modal enables developers to deploy and scale LLM inference, audio models, image generation models, and other AI models as serverless endpoints. The platform provides instant autoscaling from zero to hundreds of GPUs and back down to zero, with per-second billing. Users bring their own models and frameworks (such as vLLM, Hugging Face Transformers, or custom serving code), and Modal handles the underlying infrastructure.
The platform supports model training and fine-tuning on single-node or multi-node GPU clusters. Users can provision A100, H100, or H200 GPUs instantly without reservation or quota management. Modal's per-second billing means users pay only for the time their training job is actively running.
Modal Sandboxes provide isolated execution environments for running untrusted code from LLMs or users. Built on gVisor's security isolation, sandboxes allow developers to programmatically spin up ephemeral containers with controlled resource limits and filesystem access. Meta has used Modal Sandboxes to spin up thousands of concurrent sandboxed environments for reinforcement learning workflows.
Modal supports large-scale batch processing workloads that can scale to thousands of containers on demand. The .map() and .starmap() operations distribute work across containers in parallel, with automatic retry logic and error handling. Common use cases include data preprocessing, web scraping at scale, and ETL pipelines.
Any Modal function can be deployed as a serverless HTTPS endpoint with a single decorator. These endpoints support standard HTTP methods and can serve as REST APIs, webhook handlers, or frontend-facing services. Modal handles TLS termination, load balancing, and autoscaling for these endpoints.
Modal supports cron-like scheduling through the schedule= parameter in function decorators. Scheduled functions can run at fixed intervals or on custom cron expressions, enabling periodic data processing, model retraining, or monitoring tasks.
Modal offers real-time collaborative notebooks that run on cloud GPUs. These notebooks support sharing and provide an interactive development environment for data exploration, prototyping, and experimentation.
Modal provides two storage primitives:
Modal provides access to a range of NVIDIA GPUs through its multi-cloud infrastructure spanning AWS, Google Cloud Platform, and Oracle Cloud Infrastructure.
| GPU | Memory | Approximate Hourly Cost |
|---|---|---|
| NVIDIA T4 | 16 GB | $0.59 |
| NVIDIA L4 | 24 GB | $0.80 |
| NVIDIA A10 | 24 GB | $1.10 |
| NVIDIA L40S | 48 GB | $1.95 |
| NVIDIA A100 40 GB | 40 GB | $2.10 |
| NVIDIA A100 80 GB | 80 GB | $2.50 |
| NVIDIA RTX PRO 6000 | 48 GB | $3.03 |
| NVIDIA H100 | 80 GB | $3.95 |
| NVIDIA H200 | 141 GB | $4.54 |
| NVIDIA B200 | 192 GB | $6.25 |
All GPU pricing is billed per second with no minimum commitment. Users can request multiple GPUs per function (e.g., 4x H100 or 8x A100) for multi-GPU training or inference workloads.
Modal uses pure usage-based pricing with per-second billing. There are no reserved instances or long-term commitments.
| Resource | Price |
|---|---|
| CPU | $0.0000131 per physical core per second |
| Memory | $0.00000222 per GiB per second |
| Sandbox/Notebook CPU | $0.00003942 per core per second |
| Sandbox/Notebook Memory | $0.00000672 per GiB per second |
GPU pricing varies by GPU type (see GPU Availability table above).
| Feature | Starter (Free) | Team ($250/month) | Enterprise (Custom) |
|---|---|---|---|
| Monthly compute credits | $30 | $100 | Custom |
| Workspace seats | 3 | Unlimited | Unlimited |
| Maximum containers | 100 | 1,000 | Custom |
| GPU concurrency | 10 | 50 | Custom |
| Scheduled functions (crons) | 5 | Unlimited | Unlimited |
| Web endpoints | 8 | Unlimited | Unlimited |
| Log retention | 1 day | 30 days | Custom |
Modal offers compute credit programs for qualifying organizations:
| Program | Credits Available |
|---|---|
| Startup credits | Up to $25,000 |
| Academic credits | Up to $10,000 |
Modal operates across multiple cloud providers, including AWS, Google Cloud Platform, and Oracle Cloud Infrastructure. In September 2024, Modal announced that it had selected Oracle Cloud Infrastructure (OCI) as its primary cloud infrastructure provider. By leveraging OCI's bare metal compute instances, Modal reported improvements in performance for inference, fine-tuning, and batch processing workloads.
The platform's automated resource solver handles multi-cloud scheduling, selecting the most cost-effective GPU capacity across providers based on availability, regional constraints, and pricing. Customers interact only with Modal's unified interface and do not need to manage cloud provider accounts or configurations.
Modal is also available through the AWS Marketplace, allowing enterprise customers to apply existing AWS committed spend toward Modal usage.
Modal has achieved SOC 2 Type II certification and supports HIPAA-compliant deployments for healthcare and regulated industry use cases. The platform's use of gVisor as a container runtime provides an additional layer of security isolation compared to standard containers, since gVisor intercepts system calls rather than sharing the host kernel directly.
Modal serves thousands of customers across a range of industries and use cases. Approximately 90% of workloads on the platform are related to AI and ML applications.
| Customer | Use Case |
|---|---|
| Meta | Thousands of concurrent sandboxed environments for reinforcement learning |
| Scale AI | Evaluations, RL environments, and MCP servers with massive volume spikes |
| Substack | Deploying new ML models in hours instead of weeks |
| Ramp | AI-powered financial automation |
| Lovable | AI development platform |
| Quora | AI applications |
| Suno | Music generation |
| OpenPipe | LLM fine-tuning |
| Cognition AI | AI coding agent (Devin) |
| Mistral AI | AI inference |
| Harvey | Legal AI |
| Cartesia AI | Speech AI |
| Allen Institute for AI | Research computing |
Common use case categories include:
Modal competes in the AI infrastructure and serverless compute market alongside several categories of providers.
| Competitor | Category | Primary Differentiator |
|---|---|---|
| AWS Lambda / SageMaker | Major cloud provider | Comprehensive MLOps platform; deep AWS ecosystem integration; suited for large enterprises |
| Replicate | Serverless inference | One-click model deployment and sharing; strong for demos and prototyping; weaker cold starts (16 to 60+ seconds) |
| Baseten | Serverless inference | Open-source Truss framework for model packaging; good for early-stage teams; longer cold starts than Modal |
| Fireworks AI | Inference optimization | Custom inference engine with FireAttention; focused on production throughput and latency optimization |
| Together AI | Full-stack AI cloud | Broad model catalog (200+ models); training and inference; strong fine-tuning support |
| RunPod | GPU cloud | Raw GPU access with serverless and pod-based options; competitive pricing |
| Google Cloud Vertex AI | Major cloud provider | Integrated ML platform within Google Cloud ecosystem |
| CoreWeave | GPU cloud | Large-scale GPU clusters; focused on raw infrastructure rather than developer experience |
Modal differentiates itself through three primary advantages:
While Modal itself is a proprietary platform, the company and its founders have contributed to the open-source ecosystem. Erik Bernhardsson created and maintains several open-source projects:
| Project | Description | GitHub Stars |
|---|---|---|
| Luigi | Python workflow scheduler for batch job pipelines (created at Spotify) | 17,000+ |
| Annoy | C++/Python library for approximate nearest neighbor search (created at Spotify) | 13,000+ |
| ANN-Benchmarks | Benchmarking framework for approximate nearest neighbor algorithms | 4,000+ |
Modal's engineering team has also made upstream contributions to gVisor, specifically improving GPU support in the sandboxed container runtime.
Modal operates from three offices: Soho in New York City, Norrmalm in Stockholm, and the Mission District in San Francisco. The engineering team is notable for its concentration of competitive programming talent, including multiple International Olympiad in Informatics gold medalists. The team also includes creators of popular open-source projects (Seaborn, Luigi) and experienced engineering leaders from companies like Spotify, Scale AI, and Better.com.
As of 2025, the company had approximately 79 employees and was actively hiring across engineering, go-to-market, finance, and operations roles.