Lepton AI
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,403 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,403 words
Add missing citations, update stale details, or suggest a clearer explanation.
| Lepton AI | |
|---|---|
| Type | Subsidiary of NVIDIA (since 2025); formerly private company |
| Industry | Artificial intelligence, Cloud computing, AI inference |
| Founded | 2023 |
| Founders | Yangqing Jia (CEO), Junjie Bai |
| Headquarters | Cupertino / San Francisco Bay Area, California, United States |
| Key people | Yangqing Jia (Co-founder; Vice President of DGX Cloud at NVIDIA after acquisition), Junjie Bai (Co-founder) |
| Products | Lepton Cloud, Photon (Python SDK), Tuna LLM engine; (post-acquisition) NVIDIA DGX Cloud Lepton |
| Number of employees | Approximately 20 (at time of acquisition) |
| Parent | NVIDIA (April 2025 onward) |
| Website | lepton.ai |
Lepton AI was an American cloud computing startup that built a cloud-native inference platform for serving large language models, generative image models, and other AI workloads on NVIDIA graphics processing units (GPUs).[^1] The company was founded in 2023 by Yangqing Jia, the creator of the Caffe deep learning framework and a former vice president of Alibaba Cloud, and Junjie Bai, a former technical lead at Microsoft and Facebook who was a primary contributor to the Open Neural Network Exchange (ONNX) project.[^2][^3]
Lepton AI raised an $11 million seed round in May 2023, co-led by CRV and Fusion Fund.[^4] Over the next two years the company built a managed serving platform around its Python-centric "Photon" abstraction and an in-house inference engine called Tuna, attracting roughly twenty employees and a customer base that consisted largely of venture capital backed AI startups.[^1][^5]
In late March 2025, reports emerged that NVIDIA was in advanced talks to acquire Lepton AI for a sum estimated in the high hundreds of millions of dollars.[^6][^7] The deal closed in April 2025, with both co-founders and the engineering team moving to NVIDIA.[^8] On May 18, 2025, at the Computex trade show in Taipei, NVIDIA chief executive Jensen Huang announced NVIDIA DGX Cloud Lepton, a multi-cloud GPU marketplace platform built on the acquired Lepton team's technology.[^9][^10] The launch marked one of the most visible product debuts to emerge from an NVIDIA acquisition and confirmed that the Lepton team had become a central part of NVIDIA's cloud services strategy. Yangqing Jia took on the role of Vice President of DGX Cloud at NVIDIA following the close of the acquisition.[^11]
Lepton AI was incorporated in 2023 in the San Francisco Bay Area, with its initial business operations registered at addresses in Cupertino and Palo Alto, California.[^12] The company was founded shortly after Yangqing Jia departed Alibaba in March 2023, ending a four-year tenure during which he had risen to vice president of the company and president of its cloud computing platform group.[^13] Jia was joined by Junjie Bai, a former colleague from his Facebook AI Infrastructure team who had also worked on the AI platform group at Alibaba.[^2]
The pair set out to build what they described as a "cloud-native AI" platform, focused specifically on production inference rather than model training. By 2023, the boom in generative AI had created enormous demand for inexpensive, low-latency serving of transformer based models. Existing cloud platforms offered raw virtual machines and container orchestration, but developers building AI products had to assemble their own stack of model servers, schedulers, autoscalers, observability, and GPU management. Lepton AI's pitch was that this entire stack should be abstracted away behind a simple Python interface that turned any model into a service with a few lines of code.[^14]
The company name "Lepton" is borrowed from particle physics, where leptons are a family of elementary particles that includes the electron. The choice signaled the founders' goal of building lightweight, fundamental serving primitives for AI workloads. The branding extended to the company's open-source SDK, which used "Photon" as the name for its core abstraction, and to its inference engine "Tuna," which loosely follows the pattern of lightweight, fast objects.[^14]
Yangqing Jia (Chinese name Jia Yangqing) earned a Bachelor's degree in automation (summa cum laude) and a Master's degree in control science and engineering from Tsinghua University, then completed a Ph.D. in computer science at the University of California, Berkeley under the supervision of Trevor Darrell at the Berkeley Artificial Intelligence Research (BAIR) lab.[^15]
During his Berkeley doctorate, in late 2013, Jia released the open-source Caffe deep learning framework. Caffe (an acronym for Convolutional Architecture for Fast Feature Embedding) was one of the first widely adopted open-source deep learning libraries and became a standard tool for computer vision researchers in the mid-2010s.[^16] Written in C++ with a Python interface and released under a BSD license, Caffe predated TensorFlow and PyTorch and influenced their designs in significant ways.[^16]
After graduating from Berkeley in 2014, Jia joined Google as a research scientist at Google Brain, where he worked on TensorFlow's internal development and co-authored the paper introducing the Inception architecture and the GoogLeNet model that won the ImageNet Large Scale Visual Recognition Challenge in 2014.[^15] In February 2016 he moved to Facebook (now Meta) as a research scientist, eventually becoming Director of AI Architecture. At Facebook, Jia led the development of Caffe2, served as one of the co-leads on PyTorch 1.0, and was a co-creator of ONNX.[^17]
Jia left Facebook in 2019 to join Alibaba Group as Vice President of Technology. At Alibaba, he led the company's platform-of-AI (PAI) team and eventually served as president of the Computing Platform group within Alibaba Cloud, where he oversaw a broad portfolio of data analytics, AI, and serverless products.[^13] He departed Alibaba in March 2023, and announced via social media that he would pursue a new venture in the AI infrastructure space. That venture became Lepton AI.[^13]
Junjie Bai holds a Bachelor of Science degree from the Technical University of Munich.[^18] Prior to co-founding Lepton AI he held engineering and platform leadership roles at the high-frequency trading firm KCG Holdings, then at Facebook, and later at Alibaba Group, where he served as Director of the AI Platform team.[^18]
Bai is best known for his work on ONNX, the Open Neural Network Exchange. ONNX was announced jointly by Facebook and Microsoft on September 7, 2017, as an open file format for representing neural networks that allowed models to move between frameworks such as PyTorch, Caffe2, and CNTK without rewriting code.[^19] Bai is listed as the first author on the foundational ONNX technical report and was one of the principal Facebook engineers responsible for the project's design and rollout.[^20] ONNX v1.0 was released in December 2017 with broader industry support, and the project later became a Linux Foundation AI graduated project supported by AWS, IBM, Intel, AMD, Arm, Qualcomm, and Huawei, among others.[^19]
Together, the founding pair brought a rare combination of credentials that spanned the entire modern deep learning toolchain: framework creation (Caffe), framework interoperability (ONNX), large-scale production AI infrastructure at Facebook and Alibaba, and direct involvement in flagship efforts such as PyTorch and TensorFlow. This pedigree made Lepton AI attractive to investors and customers despite the company's small size and early stage.
Lepton AI's core product was a managed AI Platform as a Service (PaaS) that wrapped raw GPU infrastructure inside a simpler developer abstraction. The platform was built on top of Kubernetes and offered features that compete more directly with hosted inference services than with traditional infrastructure-as-a-service offerings.[^14]
At the center of the platform was a Python-first abstraction called Photon. A Photon was a Python class describing a model and its dependencies; with a few lines of code, a developer could convert research and modeling code into a deployable service complete with HTTP endpoints, request batching, autoscaling, and observability hooks. The SDK was published on GitHub under the leptonai/leptonai repository as open source, lowering the barrier to trial use.[^21]
Photons could be deployed as long-running endpoints, as ephemeral batch jobs, or as scheduled background workers. Lepton Cloud provided prebuilt Photons for common open-source models including LLaMA, Stable Diffusion XL (SDXL), and OpenAI's Whisper speech model, so that customers could spin up a hosted endpoint for these models in seconds.[^14]
For large language model serving specifically, Lepton AI built a proprietary inference engine called Tuna, which combined techniques such as dynamic batching, quantization, and custom CUDA kernels. Lepton publicly claimed that Tuna sustained more than 600 output tokens per second on common benchmark configurations and that a single Lepton deployment could serve in excess of 20 billion tokens and 1 million images per day at peak utilization.[^5] These numbers placed the engine in the same performance class as competing engines such as vLLM, TensorRT-LLM, and the proprietary engines used by Together AI and Fireworks AI.
Lepton AI marketed itself heavily to enterprise customers, attaining SOC 2 Type II and HIPAA compliance early in its life. The platform offered features such as private networking, customer-managed encryption keys, VPC peering, and dedicated GPU pools. For image generation workloads, the team integrated their own implementation of DistriFusion, a research technique that distributes diffusion sampling across multiple GPUs and was reported to deliver roughly six times faster high-resolution image generation than a single-GPU baseline.[^5]
In January 2024 the company released search_with_lepton, an open-source demonstration that combined a hosted LLM on Lepton Cloud with a web search API to create a conversational search experience similar to Perplexity AI.[^22] The repository quickly drew thousands of stars on GitHub and served as a marketing vehicle for the underlying Lepton Cloud, showcasing how a small team could deploy a production-grade AI application end to end.
Lepton AI ran a lean fundraising history with only a single publicly disclosed equity round before the NVIDIA acquisition.
| Date | Round | Amount | Lead investors | Other participants |
|---|---|---|---|---|
| May 2023 | Seed | $11 million | CRV, Fusion Fund | HSG (formerly Sequoia China), individual angels |
| April 2025 | Acquisition | Estimated several hundred million USD | NVIDIA | N/A |
The May 2023 seed round was announced shortly after the company was founded and was led by CRV and Fusion Fund, with additional participation from HSG and a handful of angel investors.[^4][^23] The round closed at a relatively small size compared with peers raising in the same period, reflecting both the small team size and a deliberate strategy by the founders to keep equity dilution low while the platform matured.[^4]
Although early press accounts of the NVIDIA acquisition occasionally suggested that Andreessen Horowitz (a16z) had invested in Lepton, public funding records show that the seed round was led by CRV and Fusion Fund rather than a16z. The misattribution likely stemmed from confusion with other Yangqing Jia-affiliated efforts and with the broader infrastructure-startup funding landscape of 2023.[^4][^23]
Lepton AI did not raise a publicly disclosed Series A. Instead, its next material financing event was the April 2025 acquisition by NVIDIA, which gave investors an unusually fast return on their capital. While neither party disclosed an exact purchase price, multiple outlets including The Information, TechCrunch, and Bloomberg reported the deal to be in the high hundreds of millions of US dollars.[^6][^7][^8]
Reports of advanced acquisition talks between NVIDIA and Lepton AI surfaced on March 26, 2025, when TechCrunch cited people familiar with the discussions to say that the chip maker was nearing a deal worth several hundred million US dollars.[^7] Coverage by SiliconANGLE, Reuters, and Bloomberg followed in the days afterward.[^6][^24] On April 8, 2025, The Information reported that the transaction had closed and that the entire Lepton AI team, including both co-founders, had moved to NVIDIA.[^8] Industry coverage placed the headline value of the deal at "hundreds of millions of dollars," though neither company filed a precise purchase price publicly.[^8]
The Lepton acquisition fit into a broader pattern of acquisitions and tuck-ins that NVIDIA had been pursuing in early 2025 as it expanded from a chip vendor into a vertically integrated AI computing company. Just weeks before the Lepton talks were reported, NVIDIA had agreed to acquire the synthetic data startup Gretel.[^7] In the years leading up to 2025 the company had also acquired OctoAI (an inference startup founded by University of Washington faculty), Run:ai (a Kubernetes GPU scheduler), and Brev.dev (a developer-experience tool for GPU computing).[^25]
For NVIDIA, Lepton brought three things at once. First, a credible team of systems engineers with deep experience operating production AI infrastructure at hyperscale (Facebook, Alibaba, Google). Second, a working multi-tenant control plane for GPU workloads that could plausibly form the substrate of a multi-cloud GPU service. Third, the personal credibility of Yangqing Jia, whose long history in the deep learning community made him an attractive face for NVIDIA's emerging cloud strategy. Jensen Huang would later cite Lepton specifically when describing how NVIDIA planned to position itself in the "AI factory" market against pure cloud providers.[^9]
Not all observers were sanguine about the deal. Short-seller Jim Chanos publicly described NVIDIA's reported Lepton acquisition as a "huge red flag" and argued that NVIDIA was buying its way into the demand side of its own market to obscure potential inventory risk. He framed the move alongside other channel-stuffing concerns that had been raised about the broader AI infrastructure boom.[^26] NVIDIA's leadership did not respond directly to those criticisms, instead emphasizing the strategic and product implications of the deal at Computex two months later.
On May 18, 2025, NVIDIA chief executive Jensen Huang delivered the keynote at Computex 2025 in Taipei. Among a long list of product announcements, Huang introduced NVIDIA DGX Cloud Lepton, describing it as an AI platform with a compute marketplace that connects developers building agentic and physical AI applications with tens of thousands of GPUs available from a global network of cloud providers.[^9][^10]
DGX Cloud Lepton was positioned as a unified entry point to NVIDIA's broader compute ecosystem. Developers would interact with a single web console and API surface that abstracted away the underlying provider, even though the actual GPU capacity could be drawn from any of a growing list of NVIDIA Cloud Partners (NCPs). The platform aimed to ease problems that had become acute during 2023 to 2025, including geographic GPU shortages, lack of price transparency across providers, and the difficulty of building data-sovereignty-aware applications that need to keep workloads in specific countries or regions.[^9]
At launch, NVIDIA announced a roster of participating clouds that drew from the global NVIDIA Cloud Partner network. The initial set spanned established U.S. neoclouds, emerging providers in Asia and Europe, and several large industrial conglomerates that had recently entered the cloud GPU business.
| Cloud provider | Headquarters region | Notes |
|---|---|---|
| CoreWeave | United States | Largest dedicated U.S. AI cloud at the time |
| Lambda (Lambda Labs) | United States | Long-standing GPU cloud and workstation vendor |
| Crusoe | United States | Built early infrastructure on stranded natural-gas power |
| Firmus | Australia | Sustainable AI factory operator |
| Foxconn | Taiwan | Industrial manufacturing partner of NVIDIA |
| GMI Cloud | United States | AI cloud focused on global GPU distribution |
| Nebius | Netherlands | Spun out of Yandex's post-divestiture entity |
| Nscale | United Kingdom | European AI compute provider |
| SoftBank Corp. | Japan | Telecom operator deploying NVIDIA Blackwell |
| Yotta Data Services | India | Largest GPU cloud in India at launch |
Additional providers were added in the months after launch, including AWS, Hugging Face (via its Training Cluster as a Service), Together AI, Mistral AI Compute, Scaleway, Fluidstack, and Firebird, expanding the platform's reach into Europe and into the broader open-source AI ecosystem.[^27][^28] A European-focused expansion of the marketplace was announced in late 2025 as NVIDIA DGX Cloud Lepton Connects Europe's Developers, with additional regional partners and a focus on sovereign AI workloads.[^29]
DGX Cloud Lepton integrates closely with NVIDIA's existing AI software products. The platform exposes NVIDIA NIM microservices (containerized inference packages for popular foundation models), the NeMo framework for model customization, Cloud Functions for serverless agent workloads, and NVIDIA Blueprints for reference applications.[^27]
Core user-facing features include:
| Feature | Description |
|---|---|
| Dev Pods | Interactive environments with Jupyter notebooks, SSH, and Visual Studio Code for exploratory work |
| Batch Jobs | Multi-node training, fine-tuning, and data-preprocessing pipelines |
| Inference Endpoints | Auto-scaling deployment surfaces for production AI services |
| Marketplace | Discovery and purchase of GPU capacity in specific regions across providers |
| Health Monitoring | GPU diagnostics via NCCL tests, GPUd metrics, and burn-in validation |
| Auto Recovery | Provider-side root-cause analysis and customer-configurable workload restart logic |
| Sovereignty Controls | Region-pinned compute placement to comply with national data regulations |
| Bring Your Own Capacity | Customers can import their own GPU clusters into the unified control plane |
A core engineering thesis of DGX Cloud Lepton, drawn directly from the original Lepton AI platform, is that the user experience of running AI workloads should not depend on the underlying GPU cloud. Customers should be able to discover available capacity, run a workload, monitor its health, and roll it over to a different provider without re-architecting their stack.[^27]
The platform entered early access in June 2025 and progressively expanded through the second half of 2025 and into 2026, integrating new cloud partners and new geographic regions.[^27]
DGX Cloud Lepton represented a noticeable evolution of NVIDIA's cloud strategy. NVIDIA's previous DGX Cloud offering had been a "managed AI training service" that ran on a small number of hyperscaler partners, including Microsoft Azure, Oracle Cloud, Google Cloud, and AWS. The new Lepton-based product was much broader: it took the same brand and extended it to a marketplace structure that included many smaller "neoclouds," industrial cloud players, and regional providers.[^9][^30] In effect, NVIDIA was using Lepton's software to coordinate a federation of independent GPU clouds, each of which was already buying NVIDIA hardware in volume, and to offer that federation to developers as if it were a single product.
Commentators including The Wall Street Journal and several industry analysts observed that this strategy could put NVIDIA in a more direct competitive position with hyperscale clouds such as Microsoft Azure, AWS, and Google Cloud, while also strengthening NVIDIA's downstream relationships with the neoclouds that consume the bulk of its high-end Blackwell systems.[^30][^31]
Following the close of the acquisition, both co-founders moved to NVIDIA along with the rest of the Lepton AI team.[^8] Yangqing Jia was named Vice President of DGX Cloud at NVIDIA, where he leads the team responsible for the DGX Cloud Lepton product line and the broader cloud-services portfolio that wraps NVIDIA hardware.[^11] In public appearances and conference talks during 2025 and 2026, Jia continued to position himself as a builder of AI infrastructure for developers, often referencing his prior work on Caffe and PyTorch as motivation for the design choices in DGX Cloud Lepton.[^11]
Junjie Bai joined NVIDIA in a senior engineering role on the same team, where his ONNX background remained relevant to NVIDIA's broader interoperability strategy across PyTorch, JAX, TensorRT, and the company's expanding portfolio of inference software.[^8]
Lepton AI competed in an unusually crowded subset of the AI infrastructure market: cloud-native inference platforms that abstract GPU operations behind serverless APIs. The category had attracted significant venture funding through 2023 and 2024, producing a long list of differentiated entrants. Even after the NVIDIA acquisition, many of these companies continued to operate independently, often alongside DGX Cloud Lepton as participating clouds or as direct alternatives.
| Competitor | Founders / Year | Primary positioning |
|---|---|---|
| Together AI | Vipul Ved Prakash et al., 2022 | Open-source model serving, fine-tuning, and RedPajama datasets |
| Fireworks AI | Lin Qiao et al., 2022 | High-performance LLM inference platform spun out of Meta PyTorch |
| Anyscale | Ion Stoica et al., 2019 | Commercial backer of the Ray distributed framework |
| Baseten | Tuhin Srivastava et al., 2019 | Model serving with Truss SDK and dedicated deployments |
| Replicate | Ben Firshman, Andreas Jansson, 2019 | Open model marketplace; Cog packaging tool |
| Modal | Erik Bernhardsson, 2021 | Serverless Python and GPU compute platform |
| DeepInfra | Nikola Borisov et al., 2022 | Low-cost open-source model API hosting |
| OctoAI | Luis Ceze et al., 2019 | Inference platform; acquired by NVIDIA in 2024 |
| SambaNova Systems | Rodrigo Liang et al., 2017 | Custom RDU chips and on-prem AI appliances |
| Cerebras Systems | Andrew Feldman et al., 2016 | Wafer-scale CS-3 systems and inference cloud |
OctoAI deserves particular comparison, as it was a similarly positioned inference platform also acquired by NVIDIA (in 2024). The acquisitions of OctoAI and then Lepton AI within roughly one year of each other underscored NVIDIA's commitment to building a serving stack of its own rather than relying on independent partners alone. Within NVIDIA, elements of both teams contributed to NVIDIA NIM, DGX Cloud Lepton, and adjacent inference products.[^25]
Several of Lepton's listed competitors went on to become participating clouds in DGX Cloud Lepton. Together AI, for example, joined the platform after Lepton's acquisition, meaning that NVIDIA's new product effectively aggregated and routed customer demand to one of Lepton's former rivals. This dynamic was characteristic of the AI infrastructure market in the mid-2020s: rather than a winner-take-all outcome, the layer of "GPU clouds" remained fragmented and complementary, while orchestration and serving layers consolidated under a smaller number of vendors, of which NVIDIA was now a major one.[^28]
Lepton AI's independent life lasted under two years, but its imprint on the AI infrastructure landscape proved disproportionately large. The company's Photon SDK influenced subsequent designs for Python-native serving libraries. Its founders' work on Caffe, PyTorch, and ONNX continued to underpin essentially every AI inference platform in the market, including those of Lepton's direct competitors. And through DGX Cloud Lepton, the platform's design ideas reached a far larger user base than the original Lepton Cloud ever did.
For NVIDIA, the acquisition served as a template for how a hardware company could climb the software stack: identify an opinionated team with a working product, acquire it for a fraction of the cost of building from scratch, and rebrand the resulting platform under the parent company's marketing umbrella. The DGX Cloud Lepton launch at Computex 2025 was, by several reporters' accounts, one of the most successful debuts of an acquired AI startup in recent memory.[^9][^30]