Baseten

AI Companies AI Infrastructure MLOps

21 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

24 citations

Revision

v7 · 4,115 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Baseten is an inference platform for deploying, serving, and scaling machine learning models in production. The company converts ML models into production-ready APIs backed by GPU-accelerated hardware, traffic-based autoscaling, and a multi-cloud architecture, and as of June 2026 it processes more than 1 billion inference calls per day across 87 clusters spanning 18 clouds. ^[1]^[17] Baseten is headquartered in San Francisco, California, with a small office in Manhattan and remote employees across the United States.

Founded in 2019 by Tuhin Srivastava, Amir Haghighat, Philip Howes, and Pankaj Gupta, Baseten raised a $1.5 billion Series F in June 2026 at a valuation of up to $13 billion, bringing total funding to more than $2 billion. ^[17]^[18] The platform serves customers including Cursor, Notion, Lovable, Harvey, HubSpot, Abridge, OpenEvidence, and Decagon, and grew revenue roughly 20x year-over-year in the run-up to the Series F. ^[17]^[18] All four original co-founders remained with the company as of 2026.

History and Founding

Baseten was founded in September 2019 by four co-founders who had worked together at Gumroad, an online marketplace for creators. Tuhin Srivastava (CEO) and Philip Howes (Chief Scientist) grew up together in Australia. They met Amir Haghighat (CTO) around 2012, when all three were among the first employees at Gumroad. At Gumroad, Haghighat served as head of engineering while Srivastava and Howes worked as data scientists who needed to build full-stack applications to use machine learning for fraud detection and content moderation. That experience forced them to learn web development, Kubernetes, and Docker on top of their ML work, a process they found unnecessarily difficult.

Before starting Baseten, Srivastava and Howes co-founded Shape, an HR analytics platform that was acquired by Reflektive in 2018. Haghighat worked as an engineering manager at Clover Health. The fourth co-founder, Pankaj Gupta, came from a software engineering role at Uber.

The founding insight behind Baseten was that while model training had become increasingly accessible through frameworks like PyTorch and TensorFlow, the bottleneck had shifted to deploying trained models for inference. Moving a trained model into production often required months of engineering work to build APIs, manage containers, handle GPU orchestration, and create monitoring infrastructure. Baseten was created so that data scientists would not have to become full-stack engineers to ship ML-powered applications.

Early Product and Pivot

Baseten initially focused on building a low-code platform that let data scientists create internal web applications on top of their ML models, similar to Retool but for ML teams. The company raised a $2.5 million seed round led by First Round Capital in 2021 under this original vision.

As the generative AI wave accelerated in 2022 and 2023, the company recognized that the larger opportunity lay in model inference infrastructure rather than application building. Baseten pivoted to focus on serverless model serving, developing Truss, an open-source model packaging framework, and building GPU-optimized infrastructure for running models at scale.

How much funding has Baseten raised?

Baseten has raised more than $2 billion in total venture capital across six rounds. ^[17] The company's valuation climbed from roughly $200 million in March 2024 to up to $13 billion by June 2026, a roughly 65x increase in about two years and a near-tripling in the five months between the January 2026 Series E and the June 2026 Series F. ^[3]^[17]

Round	Date	Amount	Lead Investor(s)	Post-Money Valuation
Seed	2021	$2.5 million	First Round Capital	Not disclosed
Series A	April 2022	$20 million	Greylock Partners	Not disclosed
Series B	March 2024	$40 million	IVP, Spark Capital	~$200 million
Series C	February 2025	$75 million	Conviction	$825 million
Series D	September 2025	$150 million	BOND	$2.15 billion
Series E	January 2026	$300 million	IVP, CapitalG	$5 billion
Series F	June 2026	$1.5 billion	Altimeter Capital, Conviction, Spark Capital	Up to $13 billion

The June 2026 Series F was led by Altimeter Capital, Conviction, and Spark Capital, and co-led by Sands Capital and Wellington Management, with participation from IVP, Greylock, 01A, Blackbird, Durable Capital Partners, Verified Capital, Battery Ventures, and D. E. Shaw Ventures. ^[17]^[18] According to the company, the round was structured across two tranches priced at $13 billion and $11 billion respectively, and Baseten said it would triple its headcount during 2026 while investing the capital in talent, compute, and enterprise go-to-market. ^[17]^[18]

Other investors across earlier rounds include NVIDIA, Greylock, Altimeter, Battery Ventures, BoxGroup, Blackbird Ventures, 01 Advisors, Premji Invest, Scribble Ventures, South Park Commons, and Lachy Groom. NVIDIA contributed $150 million to the January 2026 Series E round. ^[2]^[17]

Key quotes from founders and investors:

Tuhin Srivastava (CEO and Co-Founder), on the Series F: "The future of AI will be built on millions of specialized models, and the companies building the best ones know that post-training has become existential. It's how they build intelligence they own, on data that's theirs, optimized for the customers they serve." ^[17]^[18]
Apoorv Agrawal (Partner, Altimeter Capital): "Tuhin and the Baseten team made a bet six years ago that most people weren't ready to make: that the future of AI would be built on many specialized models, and that the companies building them would need world-class infrastructure to make them real." ^[17]
Sarah Guo (Conviction founder): "Open source AI and custom models are here to stay. Baseten has the leading product and most sophisticated, stickiest and happiest customers." ^[4]
Jay Simons (BOND partner): "Growing and scaling as quickly as they are is a challenge. Baseten carries an incredible amount of responsibility." ^[3]
Jill Chase (CapitalG partner): "All AI is pegged to inference. That's a bet everyone is very comfortable making." ^[2]

Revenue and Growth

Baseten's revenue grew from near zero in 2022 to a reported roughly 20x year-over-year increase by mid-2026, with inference volume growing roughly 40x year-over-year over the same period as demand for AI inference infrastructure surged. ^[17]^[18]

Year	Estimated Revenue	Employees	Notes
2022	Near zero	~20	Pre-pivot phase
2023	$2.7 million	~49	Inference platform gaining traction
2024	~$16 million	~100	6x year-over-year growth
2025	$15.8 million (reported)	~147	100x growth in inference volume
2026	~20x year-over-year growth (reported)	Tripling headcount	$13 billion valuation; >1B daily inference calls

By March 2024, the platform served approximately 20 large enterprises and tens of thousands of developers. By February 2025, that number had grown to over 100 large organizations and hundreds of smaller businesses. The company reported near-zero customer churn as of February 2025. By June 2026, Baseten reported processing more than 1 billion inference calls every day across 87 clusters and 18 clouds, with the multi-cloud footprint described by the company as a core part of what customers are buying. ^[17]

Platform and Technology

Inference Stack

The Baseten Inference Stack is the core technology powering the platform. It combines optimized inference runtimes with production infrastructure for autoscaling, request routing, and multi-cloud capacity management. The stack is designed to achieve 99.99% uptime through active-active deployments across multiple cloud providers and regions. ^[8]

The inference stack consists of several layers:

Runtime Layer: Supports the largest large language models with low latency and high throughput. Features include KV cache reuse and request prioritization for fast time-to-first-token (TTFT), windowed attention for long-context models, quantization support, and a speculation engine for low inter-token latency.

Infrastructure Layer: Provides cold start optimizations, intelligent request routing for KV cache and LoRA cache hit rates, multi-node inference support, and disaggregated serving. The system holds requests in a queue while new GPUs spin up, then routes those requests across the expanded compute capacity.

Baseten Delivery Network (BDN): A multi-tier caching system for model weights. When a new replica starts, the BDN agent fetches a manifest, downloads weights through an in-cluster cache, and stores them in a node-level cache. The system uses parallelized byte-range downloads and specialized pods to accelerate loading times. As a result, most models cold start in seconds, and even the largest models start in a few minutes. ^[8]

Deployment Options

Baseten offers three deployment configurations:

Option	Description	Best For
Baseten Cloud	Fully managed multi-cloud infrastructure across 10+ providers	Teams that want zero infrastructure management
Self-Hosted	Deploy within customer VPCs for compliance and data control	Regulated industries, strict data residency requirements
Hybrid	Primary on self-hosted infrastructure with overflow to Baseten Cloud during demand spikes	Enterprises needing both control and elastic scaling

The self-hosted option supports SOC 2 Type II certification and HIPAA compliance, controls data residency, and aligns with standards like GDPR.

Multi-Cloud Architecture

Baseten abstracts differences between cloud providers to ensure its Inference Stack runs identically on any underlying infrastructure. The multi-cloud approach powers high availability through active-active deployments across different clouds. If a region or provider faces a capacity crunch or outage, the system can rapidly reroute and reprovision workloads to maintain service continuity. By mid-2026 this footprint spanned 18 clouds and 87 clusters. ^[17]

In December 2025, Baseten signed a Strategic Collaboration Agreement (SCA) with Amazon Web Services, expanding the availability of Baseten's inference services to customers deploying AI applications on AWS. ^[16] This partnership gives enterprises a way to use Baseten's inference technology on their own AWS infrastructure while keeping full control of their data.

Products

Model APIs

Baseten Model APIs provide instant access to popular open-source models through OpenAI-compatible endpoints. Developers can point their existing OpenAI SDK at Baseten's inference endpoint and start making calls without any model deployment. Pricing is calculated per million input and output tokens.

Available models include DeepSeek V3, DeepSeek R1, Llama 4 Maverick, Qwen, GLM-5, and GPT-OSS-120B, among others. Model APIs support structured outputs and tool calling. By leveraging Google Cloud A4 virtual machines based on NVIDIA Blackwell GPUs and the NVIDIA Dynamo inference framework, Baseten serves these models with over 225% better cost-performance for high-throughput inference and 25% better cost-performance for latency-sensitive inference compared to previous-generation hardware. ^[12]

Dedicated Deployments

Dedicated deployments let teams serve custom, fine-tuned, and open-source models on specific GPU hardware. Users package their model using Truss, deploy it to Baseten, and receive a production API endpoint with autoscaling, monitoring, and request routing. Billing is per minute for the specific GPU hardware running the model.

Dedicated deployments support models from any framework: Hugging Face Transformers, diffusers, PyTorch, TensorFlow, vLLM, SGLang, TensorRT-LLM, and custom serving code.

Baseten Training

Launched in late 2025, Baseten Training is an infrastructure platform for fine-tuning open-source AI models. The platform handles GPU cluster management, multi-node orchestration, and cloud capacity planning. A key differentiator is model weight ownership: all production-critical artifacts, including model weights, evaluations, and training scripts, belong entirely to the customer. ^[14] This stands in contrast to some competing fine-tuning platforms whose terms of service restrict customers from exporting their fine-tuned model weights.

The Training platform supports multiple weight formats including full model fine-tunes and LoRA adapter weights, with seamless promotion from training jobs to inference endpoints on Baseten's serving infrastructure. Baseten also credits 20% of training costs toward inference usage.

Baseten Chains

Baseten Chains is an SDK for building and deploying compound AI systems, which are multi-model workflows that combine several AI models or processing steps. Built on top of Truss, Chains reached general availability in 2025. ^[11]

The architecture uses two core concepts:

Chain: The entire compound workflow, such as a retrieval-augmented generation (RAG) pipeline or audio transcription pipeline.
Chainlet: An individual subtask within a Chain. A Chainlet can be a model, a data coordination step, or a processing step with business logic.

Chainlets call each other directly without a centralized orchestration executor, which reduces latency by eliminating intermediary result retrieval and transmission between steps. Each Chainlet runs on customized hardware with independent autoscaling. For example, in a transcription pipeline, chunking operations can scale horizontally on CPUs while the transcription model runs on GPUs.

Key features of Chains include output streaming, binary IO with NumPy array support, subclassing for Chainlet reuse, "Chains Watch" for live-patching deployed code, and built-in linting and logging.

Common use cases for Chains include:

Voice AI and AI phone calling
Audio transcription (Baseten claims to have built the "world's fastest Whisper" implementation)
RAG pipelines
Agentic workflows
Content creation and multi-step AI systems

Baseten Embeddings Inference (BEI)

Baseten Embeddings Inference is a purpose-built system for high-throughput embedding, reranker, and classifier inference. On NVIDIA B200 GPUs, BEI achieved 3.3x higher throughput than vLLM and 3.6x higher throughput than TEI (Text Embeddings Inference) running on H100s. ^[24]

Is Truss open source?

Truss is Baseten's open-source framework for packaging and serving ML models. Written in Python, Truss handles containerization, dependency management, and GPU configuration, allowing developers to create containerized model servers without learning Docker or Kubernetes. ^[9]^[10]

Key features of Truss include:

Feature	Description
Framework-agnostic	Supports models from any framework: Transformers, diffusers, PyTorch, TensorFlow, vLLM, SGLang, TensorRT-LLM
No Docker required	Creates containerized model servers without writing Dockerfiles
Live reload	Iterate on model serving code in a remote development environment that mirrors production
GPU support	Built-in configuration for GPU types and counts
Secrets management	Secure handling of API keys and credentials
Caching	Model weight and dependency caching for fast cold starts
Local and remote parity	Equally straightforward to serve a model on localhost and in production

Truss is maintained by Baseten and has accumulated over 6,000 stars on GitHub. ^[9] While Truss deploys natively to the Baseten Inference Stack, it can also be used to deploy models to other infrastructure.

GPU Hardware and Pricing

GPU Instance Pricing

Baseten offers a range of NVIDIA GPUs with per-minute billing and no idle charges. Users pay only for the time their model is actively using compute. ^[7]

GPU	VRAM	Per Minute	Per Hour (approx.)
NVIDIA T4	16 GiB	$0.01052	$0.63
NVIDIA L4	24 GiB	$0.01414	$0.85
NVIDIA A10G	24 GiB	$0.02012	$1.21
NVIDIA H100 MIG	40 GiB	$0.0625	$3.75
NVIDIA A100	80 GiB	$0.06667	$4.00
NVIDIA H100	80 GiB	$0.10833	$6.50
NVIDIA B200	180 GiB	$0.16633	$9.98

Baseten reduced prices by 40% across all instance types at one point, reflecting decreasing GPU costs and improved infrastructure efficiency.

Fractional GPU Support

Baseten supports fractional H100 GPUs through NVIDIA's Multi-Instance GPU (MIG) technology. An H100 MIG instance provides 40 GiB of VRAM at $3.75 per hour, compared to $6.50 per hour for a full 80 GiB H100. ^[19] This allows smaller models to run on high-performance hardware at lower cost.

Plan Tiers

Feature	Basic	Pro	Enterprise
Monthly cost	$0 (pay as you go)	Volume discounts available	Custom pricing
Dedicated deployments	Yes	Yes	Yes
Model APIs	Yes	Yes	Yes
Training	Yes	Yes	Yes
Fast cold starts	Yes	Yes	Yes
SOC 2 / HIPAA	Yes	Yes	Yes
Priority GPU access	No	Yes	Yes
Dedicated compute	No	Yes	Yes
Custom SLAs	No	No	Yes
Self-hosting	No	No	Yes
Custom global regions	No	No	Yes

Performance Benchmarks

Baseten has published several performance benchmarks highlighting its inference optimization work:

TensorRT on H100: Serving Stable Diffusion XL (SDXL) with NVIDIA TensorRT on an H100 GPU improved latency by 40% and throughput by 70% compared to standard serving. For language models, the H100 provides 2x to 3x better inference performance than the A100 while costing only 62% more per hour. ^[20]

Mistral 7B on H100 MIG: Running Mistral 7B in FP16 on an H100 MIG instance demonstrated 20% lower latency and 6% higher total throughput compared to a full A100 GPU, at a lower hourly cost. ^[19]

Blackwell GPUs: Using Google Cloud A4 virtual machines with NVIDIA Blackwell architecture, Baseten achieved 225% better cost-performance for high-throughput inference and 25% better cost-performance for latency-sensitive inference. ^[12]

BEI on B200: Baseten Embeddings Inference on B200 GPUs achieved 3.3x higher throughput than vLLM and 3.6x higher throughput than TEI on H100s. ^[24]

Voice AI: For text-to-speech applications using Chains, processing times were halved and GPU utilization improved 6x.

NVIDIA Partnership

NVIDIA made a $150 million investment in Baseten as part of the company's $300 million Series E round in January 2026. ^[2]^[17] The investment reflects NVIDIA's strategic interest in the AI inference software layer. Baseten uses NVIDIA's hardware stack extensively, from T4 and A100 GPUs through the latest Blackwell B200 architecture.

Baseten adopted NVIDIA's Blackwell GPUs on Google Cloud alongside the NVIDIA Dynamo inference framework and TensorRT-LLM. NVIDIA has also published a case study on Baseten's inference infrastructure, highlighting the company as an example of optimized GPU utilization for AI workloads. ^[13]

Acquisition of Parsed

In December 2025, Baseten acquired Parsed, a reinforcement learning startup specializing in post-training and continual learning for large models. ^[15] Parsed was co-founded by CEO Mudith Jayasekara and Chief Scientist Charles O'Neill. Before the acquisition, Parsed had already been working closely with Baseten's ecosystem, running more than 500 training jobs on Baseten's infrastructure.

The acquisition brought production data, fine-tuning, and inference under one roof, enabling companies to shape learning signals from production usage through reinforcement learning that rewards strong outputs and penalizes weak ones. Financial terms were not disclosed.

Who are Baseten's customers?

Baseten serves a wide range of AI companies and enterprises. By 2026 the customer base included over 100 large organizations and hundreds of smaller businesses, with the company highlighting Cursor, Notion, Lovable, Harvey, HubSpot, OpenEvidence, Abridge, Decagon, and Parallel among its named customers. ^[17]^[23]

Customer	Industry / Use Case
Cursor	AI-powered code editor
Notion	Productivity and AI features
Quora	AI-powered Q&A platform
Patreon	Creator platform; uses Baseten for OpenAI Whisper deployment
Clay	Sales intelligence and data enrichment
Writer	Enterprise AI writing platform
Abridge	Medical transcription
OpenEvidence	Medical AI
HeyGen	AI video generation
Mercor	AI recruiting
Superhuman	AI-powered email
World Labs	Spatial intelligence
Hex	Data analytics
Decagon	Customer support AI
Retool	Internal tool building
Wispr	Voice AI
Lovable	AI development platform
Harvey	Legal AI
HubSpot	Customer relationship management
Parallel	AI web research and search
Scaled Cognition	AI agent platform

Paperwork filed by Patreon showed the company saved 440 engineer hours annually, $600,000, and 70% in GPU costs by deploying OpenAI Whisper on Baseten. ^[4]

Scaled Cognition, an AI agent platform, deployed Baseten's inference stack on their own AWS GPUs within their VPC and achieved time-to-first-token under 120 milliseconds while reducing overall latency by 40%.

How does Baseten differ from competitors?

Baseten competes in the AI inference infrastructure market against both specialized startups and cloud provider offerings.

Competitor	Category	Differentiator
Replicate	Serverless inference	One-click model deployment; strong for demos and prototyping; simpler developer experience
Modal	Serverless compute	Python-native infrastructure; faster cold starts (sub-second to 4 seconds); broader use cases beyond inference
Together AI	Full-stack AI cloud	Broad model catalog (200+ models); combined training and inference
Fireworks AI	Inference optimization	Custom FireAttention engine; focused on throughput and latency
Anyscale	Ray-based AI compute	Built around the open-source Ray framework; distributed training and serving
AWS SageMaker	Cloud ML platform	Deep AWS ecosystem integration; comprehensive MLOps tooling
Google Vertex AI	Cloud ML platform	Integrated within Google Cloud; managed ML pipelines
Azure ML	Cloud ML platform	Microsoft ecosystem integration; enterprise features
RunPod	GPU cloud	Raw GPU access; competitive pricing; pod-based and serverless options
CoreWeave	GPU cloud	Large-scale GPU clusters; focused on raw infrastructure
Lambda	GPU cloud	On-demand GPU instances; research-focused

Baseten differentiates itself from API-first platforms like Replicate through its support for custom models and enterprise deployment options (self-hosted, hybrid). Compared to general-purpose compute platforms like Modal, Baseten focuses specifically on model inference optimization with features like the Baseten Delivery Network for model weight caching and TensorRT-LLM integration. Against cloud provider offerings like SageMaker, Baseten positions itself as faster to deploy, with better GPU utilization and 40% lower costs compared to in-house infrastructure solutions.

The AI inference market was valued at $106.2 billion in 2025 and is projected to reach $255 billion by 2030, growing at a 19.2% compound annual growth rate.

Open Source

Baseten maintains several open-source projects on GitHub under the basetenlabs organization:

Project	Description
Truss	Model packaging and serving framework (6,000+ GitHub stars)
ML Cookbook	Ready-to-use ML training recipes for building and deploying models on Baseten
Model Trusses	Pre-built Truss configurations for popular models (MPT-7B, Stable Diffusion, Whisper, and others)

Truss is the primary open-source contribution and serves as the on-ramp for developers adopting the Baseten platform. It is also usable independently of Baseten for local model serving and deployment to other infrastructure.

Leadership

Name	Role	Background
Tuhin Srivastava	CEO and Co-Founder	Former data scientist at Gumroad; co-founded Shape (acquired by Reflektive, 2018)
Amir Haghighat	CTO and Co-Founder	Former head of engineering at Gumroad; engineering manager at Clover Health
Philip Howes	Chief Scientist and Co-Founder	Former data scientist at Gumroad; co-founded Shape
Pankaj Gupta	Co-Founder	Former software engineer at Uber

CEO Tuhin Srivastava has stated: "We think AI applications are just the last great market. We want to be the index for that economic growth." ^[18]

References

Baseten. "Inference Platform: Deploy AI models in production." https://www.baseten.co/ ↩
Baseten. "Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future." BusinessWire, January 23, 2026. https://www.businesswire.com/news/home/20260123035833/en/Baseten-Raises-$300M-at-a-$5B-Valuation-to-Power-a-Multi-Model-Future ↩
Fortune. "Exclusive: Baseten, AI inference unicorn, raises $150 million at $2.15 billion valuation." September 5, 2025. https://fortune.com/2025/09/05/exclusive-baseten-ai-inference-unicorn-raises-150-million-at-2-15-billion-valuation/ ↩
Contrary Research. "Baseten Business Breakdown & Founding Story." https://research.contrary.com/company/baseten ↩
Baseten. "Announcing our Series B." March 2024. https://www.baseten.co/blog/announcing-our-series-b/
GlobeNewsWire. "Baseten Gives Data Science and Machine Learning Teams the Superpowers They Need, Raises $20 Million in Funding Led by Greylock." April 26, 2022. https://www.globenewswire.com/news-release/2022/04/26/2429468/0/en/
Baseten. "Cloud Pricing." https://www.baseten.co/pricing/ ↩
Baseten. "The Baseten Inference Stack." https://www.baseten.co/resources/guide/the-baseten-inference-stack/ ↩
GitHub. "basetenlabs/truss: The simplest way to serve AI/ML models in production." https://github.com/basetenlabs/truss ↩
Baseten. "Why we built and open-sourced a model serving solution." https://www.baseten.co/blog/why-we-open-sourced-truss/ ↩
Baseten. "Baseten Chains is now GA for production compound AI systems." https://www.baseten.co/blog/baseten-chains-for-production-compound-ai-systems/ ↩
Google Cloud Blog. "How Baseten achieves 225% better cost-performance for AI inference." https://cloud.google.com/blog/products/ai-machine-learning/how-baseten-achieves-better-cost-performance-for-ai-inference ↩
NVIDIA. "Baseten's AI Inference Infrastructure." Case study. https://www.nvidia.com/en-us/case-studies/baseten/ ↩
VentureBeat. "Baseten takes on hyperscalers with new AI training platform that lets you own your model weights." November 10, 2025. https://venturebeat.com/ai/baseten-takes-on-hyperscalers-with-new-ai-training-platform-that-lets-you-own-your-model-weights/ ↩
BusinessWire. "Baseten Acquires Parsed to Enable Companies to Own Their Intelligence." December 10, 2025. https://www.businesswire.com/news/home/20251210284353/en/Baseten-Acquires-Parsed-to-Enable-Companies-to-Own-Their-Intelligence ↩
BusinessWire. "Baseten Signs Strategic Collaboration Agreement with AWS." December 3, 2025. https://www.businesswire.com/news/home/20251203239212/en/ ↩
BusinessWire. "Baseten Raises $1.5 Billion to Power the Next Era of AI Inference." June 22, 2026. https://www.businesswire.com/news/home/20260622645563/en/Baseten-Raises-$1.5-Billion-to-Power-the-Next-Era-of-AI-Inference ↩
Baseten. "Announcing our Series F." June 22, 2026. https://www.baseten.co/blog/announcing-our-series-f/ ↩
Baseten. "Using fractional H100 GPUs for efficient model serving." https://www.baseten.co/blog/using-fractional-h100-gpus-for-efficient-model-serving/ ↩
Baseten. "Unlocking the full power of NVIDIA H100 GPUs for ML inference with TensorRT." https://www.baseten.co/blog/unlocking-the-full-power-of-nvidia-h100-gpus-for-ml-inference-with-tensorrt/ ↩
Sacra. "Baseten valuation, funding & news." https://sacra.com/c/baseten/
Baseten. "Production-First Model APIs." https://www.baseten.co/products/model-apis/
Baseten. "Customer stories." https://www.baseten.co/resources/customers/ ↩
Baseten. "How we built BEI: high-throughput embedding, reranker, and classifier inference." https://www.baseten.co/blog/how-we-built-bei-high-throughput-embedding-inference/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

6 revisions by 1 contributors · full history

Suggest edit

What links here

Companies Conviction Greylock Partners Lepton AI Linkup Modal (platform)Organizations fal.ai

History and Founding

Early Product and Pivot

How much funding has Baseten raised?

Revenue and Growth

Platform and Technology

Inference Stack

Deployment Options

Multi-Cloud Architecture

Products

Model APIs

Dedicated Deployments

Baseten Training

Baseten Chains

Baseten Embeddings Inference (BEI)

Is Truss open source?

GPU Hardware and Pricing

GPU Instance Pricing

Fractional GPU Support

Plan Tiers

Performance Benchmarks

NVIDIA Partnership

Acquisition of Parsed

Who are Baseten's customers?

How does Baseten differ from competitors?

Open Source

Leadership

References

Improve this article

Related Articles

Replicate

LangSmith

Run:ai

NVIDIA Picasso

Feature store

Model deployment

What links here

Related Articles

Replicate

LangSmith

Run:ai

NVIDIA Picasso

Feature store

Model deployment

What links here