# LiteLLM

> Source: https://aiwiki.ai/wiki/litellm
> Updated: 2026-06-25
> Categories: Developer Tools, Large Language Models, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**LiteLLM** is an open-source AI gateway from BerriAI that lets developers call more than 100 large language model providers (including [OpenAI](/wiki/openai), [Anthropic](/wiki/anthropic), Google [Gemini](/wiki/gemini), [Amazon Bedrock](/wiki/amazon_bedrock) and [Azure OpenAI](/wiki/azure_openai)) through a single, unified interface that uses the [OpenAI](/wiki/openai_api) Chat Completions request and response format.[^1][^6] It ships in two parts: a Python SDK whose `litellm.completion()` function standardises inputs, outputs and errors across every provider, and the LiteLLM Proxy (also marketed as the "LiteLLM Gateway" or "AI Gateway"), a self-hostable FastAPI server that adds routing, automatic fallbacks, virtual API keys, spend tracking and observability for an entire engineering organisation.[^1][^4] Released in August 2023 and developed by BerriAI, a Y Combinator Winter 2023 startup founded by Ishaan Jaffer and Krrish Dholakia, LiteLLM has become one of the most widely deployed components of the LLM-operations ("LLMOps") layer, with roughly 51,500 GitHub stars and production users including Netflix, Stripe, Lemonade, Rocket Money and Adobe.[^1][^2][^6]

| Attribute | Value |
|---|---|
| Developer | BerriAI (YC W23) |
| Founders | Ishaan Jaffer, Krrish Dholakia |
| Initial public release | August 2023 |
| License | MIT (open source core); commercial enterprise license |
| Language | Python |
| Repository | github.com/BerriAI/litellm |
| Latest release (as of writing) | v1.89.4 (25 June 2026) |
| Components | Python SDK, Proxy Server ("LiteLLM Gateway") |
| Supported providers | 100+ |
| GitHub stars | ~51,500 |
| Headquarters | San Francisco, California |

## What is LiteLLM?

LiteLLM solves a single, concrete problem: every LLM vendor exposes a different request schema, error model and streaming convention, so an application that wants to use more than one model normally has to maintain a bespoke integration for each. LiteLLM removes that burden by translating one common format, OpenAI's `/v1/chat/completions` schema, to and from every supported provider's native API.[^3][^14] The result is that switching from, say, GPT-4o to [Claude](/wiki/claude) or Gemini becomes a one-line change of the `model` string rather than a rewrite of client code. As Y Combinator described the project at launch, it is "an open source package that allows you to call 100+ LLM APIs (like Llama2, Anthropic, and Huggingface) using the OpenAI format."[^10]

The project is distributed in two forms that share the same provider catalogue: the in-process Python SDK for embedding directly in an application, and the standalone Proxy Server for serving an organisation. Both expose the same families of calls (chat completions, embeddings, image generation, audio transcription, reranking, batches and the [OpenAI Responses API](/wiki/openai_responses_api)) and both normalise streaming so that downstream code always sees OpenAI-style `delta` chunks.[^3][^14]

## History

### Who created LiteLLM, and why?

LiteLLM was built inside BerriAI, a company founded in 2023 by Krrish Dholakia and Ishaan Jaffer.[^2][^8] The founders were attempting to build a "chat-with-your-data" SaaS product and discovered that supporting multiple LLM back-ends, each with a different request schema, error model and streaming convention, was a substantial engineering burden; LiteLLM was extracted from that internal need.[^8] BerriAI was accepted into Y Combinator's Winter 2023 batch and remains a small, venture-backed team of around ten people based in San Francisco.[^9]

### When was LiteLLM launched?

Y Combinator publicly launched LiteLLM via its Launch YC channel on 24 August 2023.[^10] Y Combinator's announcement described the project as "an open source package that allows you to call 100+ LLM APIs (like Llama2, Anthropic, and Huggingface) using the OpenAI format."[^10] At launch the library standardised inputs, outputs and exceptions across providers, shipped with more than 50 test cases, and already included logging integrations to Sentry, PostHog and [Helicone](/wiki/helicone).[^10] The first tagged GitHub release of the SDK preceded the YC launch by approximately two weeks.[^9]

### How was LiteLLM funded, and how fast did it grow?

BerriAI raised a $1.6 million seed round in 2023 co-led by Y Combinator with participation from Gravity Fund and Pioneer Fund.[^11] The project grew rapidly: by mid-2025 InfoWorld reported "over 20,000 GitHub stars and 2,600 forks,"[^7] and by mid-2026 the official BerriAI/litellm repository reported approximately 51,500 stars, 9,200 forks and 1,372 tagged releases.[^1] The LiteLLM homepage cites more than one billion requests proxied and over 240 million Docker pulls for the official container image, with more than 1,005 individual contributors to the project.[^6][^12]

The release cadence is unusually fast for an open-source infrastructure project: the BerriAI/litellm GitHub release archive listed more than 1,370 tagged releases by June 2026, an average of well over one tagged build per day across the project's roughly thirty-four-month public history, and the maintainers have at times shipped multiple patch releases within a single day to accommodate API changes from upstream providers.[^1][^30] The team explicitly markets "day-zero" coverage of new flagship models; Netflix Staff Software Engineer David Leen is quoted on the homepage saying that "LiteLLM has let my team provide the latest LLM models to our users usually within a day" of their release.[^6]

## How does LiteLLM work?

### A single format for many providers

The central design choice of LiteLLM is to use [OpenAI's](/wiki/openai_api) `/v1/chat/completions` request and response schema as a lingua franca for all supported providers.[^3][^13] Each provider integration is a translation layer that maps incoming OpenAI-shaped messages, tool definitions, streaming chunks and error codes onto the provider's native API, then converts the response back into the OpenAI shape that the caller expects.[^3] As the docs put it, LiteLLM "maps every provider's errors to the OpenAI exception types," so existing client code written against the OpenAI SDK works against any LiteLLM-supported back-end with only a change of the `model` string.[^14]

LiteLLM's provider catalogue is organised across several endpoint families. The Python SDK exposes `litellm.completion`, `litellm.acompletion` (the async variant), `litellm.embedding`, `litellm.image_generation`, `litellm.audio_transcription`, `litellm.responses` (a translator for the [OpenAI Responses API](/wiki/openai_responses_api)), `litellm.batches` and `litellm.assistants` calls, all routed through the unified mapper.[^3][^13] Streaming is normalised so that downstream code sees OpenAI-style `delta` chunks regardless of provider.[^14]

This design choice has several engineering consequences. Because OpenAI's API is widely understood and well-tooled, applications can use existing OpenAI client libraries, retry middleware, and inspection utilities directly. Provider differences that lack an OpenAI analogue (for example, Anthropic's earlier system-prompt convention or the per-provider format for tool definitions and tool-result messages) are normalised in the translator, with provider-specific extensions accessible through dedicated parameters where needed. The translation layer also performs token accounting using each provider's tokenizer when one is available, so that cost and rate-limit calculations reflect the provider's actual billable units rather than a uniform but inaccurate approximation.[^20]

### How do you use the Python SDK?

A minimal example mirrors the OpenAI Python SDK exactly except for the provider prefix in the model string:[^14]

```
from litellm import completion
import os

os.environ["OPENAI_API_KEY"] = "..."
response = completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
```

Swapping providers is a one-line change:[^14]

```
response = completion(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Hello"}],
)
```

The SDK ships an in-process `Router` class that can be configured with multiple deployments per model name, routing strategy, retries, cooldowns and fallback chains, making it usable as a library-level load balancer without standing up the proxy server.[^15] Router behaviour is configured via a model list of dictionaries, each containing a public `model_name` (the logical alias clients will request), a `litellm_params` block (provider, deployment-specific API base, key and tuning parameters), plus optional metadata such as `rpm`, `tpm` and `weight`. Multiple entries can share the same `model_name`, in which case the Router treats them as deployments of a single virtual model and selects between them at request time according to the configured strategy.[^15]

## What is the LiteLLM Proxy (LiteLLM Gateway)?

The LiteLLM Proxy, also marketed as the "LiteLLM Gateway" and "AI Gateway," is a FastAPI-based service that exposes OpenAI-compatible HTTP endpoints (`/chat/completions`, `/embeddings`, `/images/generations`, `/audio/transcriptions`, `/batches` and others) and proxies them to any combination of configured back-end providers.[^4][^16] Because the surface is OpenAI-compatible, "any client that works with OpenAI works with the proxy, no code changes needed,"[^4] including the official OpenAI SDKs, [LangChain](/wiki/langchain), [LlamaIndex](/wiki/llamaindex) and Instructor.[^16]

The gateway is configured via a YAML file (commonly `config.yaml`) that defines model groups, deployment ordering, routing strategy, fallbacks, budgets and observability sinks; deployment is typically via Docker, Helm or the BerriAI-managed "LiteLLM Cloud" SaaS, with PostgreSQL as the backing store for keys and spend logs and Redis used to share rate-limit and load-balancing state across replicas.[^4][^17] The published reference architecture, described in the official Docker quick-start tutorial, places one or more proxy replicas behind an HTTP load balancer, with PostgreSQL handling persistent state and Redis handling ephemeral, per-second counters such as token budgets and concurrency limits.[^4]

The proxy advertises its OpenAI compatibility as a "drop-in" replacement: a developer can change the `OPENAI_BASE_URL` environment variable that the OpenAI SDK reads and the rest of the application becomes vendor-agnostic without further changes.[^4][^16] This makes the proxy attractive as a migration tool for organisations that already have substantial code targeting the OpenAI SDK and want to move some or all of their traffic to other providers without a code rewrite. The same pattern works for agentic frameworks such as the [OpenAI Agents SDK](/wiki/openai_agents_sdk), [LangChain](/wiki/langchain) agents, and [LlamaIndex](/wiki/llamaindex) query engines, all of which speak the OpenAI HTTP API natively.[^16]

### How does routing, load balancing and fallback work?

The Router supports several strategies, including a default `simple-shuffle` for low overhead, latency-based routing, usage-based routing (tokens per minute or requests per minute, shared across replicas via Redis), least-busy, cost-based, weighted random and pluggable custom strategies.[^15] Deployments can be tagged with an integer `order` so the router prefers lower-priority deployments first and only escalates on failure.[^15] When a deployment fails, the router applies a per-deployment cooldown (default three failures per minute, five-second cooldown) so that a single misbehaving back-end is isolated rather than the whole model group being shut down.[^15]

LiteLLM distinguishes three fallback families: standard fallbacks for general errors (rate limits, timeouts, 5xx), content-policy fallbacks that fire on provider content-policy violations, and context-window fallbacks that route to a larger-context model when the request exceeds a deployment's window.[^18] Fallbacks are configured as ordered lists per model name, can be overridden per request via `"fallbacks": [...]` in the request body, and can be disabled entirely with `"disable_fallbacks": true`.[^18] An `enable_pre_call_checks` mode lets the router filter out deployments that cannot satisfy a request's context length or region requirements before the call is made.[^18]

### What are virtual API keys and budgets?

Once the proxy is connected to a PostgreSQL database and a "master key" is set, administrators can mint virtual API keys via the `/key/generate` endpoint.[^19] Each virtual key can be scoped to a model allowlist, a team, a user and a budget, and the proxy enforces tokens-per-minute and requests-per-minute rate limits at the key, team and user levels.[^19][^20] Spend is computed using LiteLLM's `completion_cost()` function against a per-provider pricing table and is automatically attributed to the requesting key, team, user and organisation; the `/global/spend/report` endpoint aggregates spend by team, customer or API key, and per-call cost is also returned to the caller in the `x-litellm-response-cost` response header.[^20] Enterprise-tier features layered on top include tag-based budgets, model-specific budgets per virtual key, temporary budget increases, soft-budget email alerts, and richer spend-logging metadata.[^21]

Virtual keys also serve as the unit of model-aliasing: an administrator can configure a key so that requests for `gpt-4` are rewritten to `gpt-4o-mini` (downgrade) or to a fine-tuned [Claude](/wiki/claude) model on Bedrock (cross-provider substitution), without the calling application needing to know about the change. Combined with model groups in the YAML configuration, this allows the platform team to perform centralised model deprecation, A/B testing of model upgrades, and cost-driven traffic shaping without coordinating code changes in every downstream application.[^19] The proxy also exposes endpoints to list, update, regenerate and revoke virtual keys, supporting key-rotation workflows that include grace periods during which both old and new keys remain valid.[^19]

### What observability does LiteLLM provide?

LiteLLM exposes three callback hooks: `input_callbacks`, `success_callbacks` and `failure_callbacks`. Each can be set to a list of named integrations; for example, `litellm.success_callback = ["posthog", "helicone", "langfuse", "lunary"]` and `litellm.failure_callback = ["sentry", "lunary", "langfuse"]`.[^22] Built-in callback targets include [Langfuse](/wiki/langfuse), Lunary, [Helicone](/wiki/helicone), LangSmith, Traceloop, Athina, Sentry, PostHog, Slack, Arize, PromptLayer, [MLflow](/wiki/mlflow), DeepEval, Braintrust, DataDog and OpenTelemetry, alongside others.[^22] The proxy also exposes Prometheus metrics and supports per-team logging so that, for example, a team's traffic can be sent to its own Langfuse project.[^21]

Because the callback API is uniform across providers, observability traces produced by LiteLLM carry the same fields (token counts, latency, cost, model, key, team, user) regardless of which back-end ultimately served the request. This makes the gateway a natural point for cross-provider analytics dashboards and for cross-provider evaluations, where the same prompt is sent to several models in parallel and the resulting traces are compared after the fact. Langfuse's own integration documentation describes LiteLLM as a recommended source for ingest, and observability vendors such as Helicone and Lunary publish reciprocal walkthroughs that pair their platforms with LiteLLM-managed traffic.[^22]

### Does LiteLLM support the Model Context Protocol (MCP)?

Yes. The LiteLLM Proxy includes an MCP Gateway that lets administrators register [Model Context Protocol](/wiki/model_context_protocol) tool servers behind a single fixed endpoint and govern access to them by virtual key, team or organisation.[^31] The gateway can list, call and manage MCP tools over multiple transports (HTTP, SSE and stdio), convert OpenAPI specifications into MCP servers, and apply OAuth 2.0 authentication, so any model reachable through LiteLLM (OpenAI, Azure, Anthropic, Bedrock and others) can call the same set of MCP tools with centralised permissions.[^31] This positions the LiteLLM Proxy not only as a model gateway but also as a tool gateway for agentic workflows, complementing its README-cited integrations with LangGraph and the Google Vertex AI Agent Engine.[^1]

## Is LiteLLM open source? The enterprise tier

LiteLLM's open-source distribution under the MIT-licensed core already includes 100+ provider integrations, virtual keys, budgets, teams, load balancing and guardrail hooks at no cost.[^23] BerriAI sells a separate enterprise tier, recommended for organisations running the gateway "at scale, 100+ users or 10+ production AI use-cases," that adds single sign-on for the admin UI (Okta, Azure AD, Google Workspace, OIDC and SAML), JWT-based authentication, audit logs with configurable retention, role-based access control, IP allowlisting, automated key rotations and integrations with secret managers including AWS KMS, Azure Key Vault and HashiCorp Vault.[^17][^21] Governance features include a four-tier multi-tenant hierarchy (organisations, teams, projects, keys), tag-based spend tracking, programmatic spend reports and per-key or per-team guardrails for secret redaction and content moderation.[^21]

BerriAI also operates a managed "LiteLLM Cloud" deployment, which the company describes as SOC 2 Type 2 and ISO 27001 certified and load-tested at 1,000 requests per second.[^17] Enterprise support includes dedicated Slack or Teams channels with service-level agreements ranging from a one-hour response for production-blocking issues to twenty-four hours for non-urgent matters.[^17] Pricing is custom and quoted on request rather than published.[^23]

The split between open-source core and paid enterprise tier follows a common LLMOps "open-core" pattern. Critical features such as the unified API, the Router, virtual keys, the admin UI and basic spend tracking are available to anyone using the public Docker image or the PyPI package, while features tied to compliance, governance and large-team operations are gated behind a license key that the enterprise distribution requires at startup.[^17][^23] BerriAI's published guidance is that small teams and individual developers should stay on the open-source core, while organisations exceeding roughly 100 users or 10 production AI use-cases will typically need the enterprise feature set to operate the gateway safely at scale.[^17]

## What providers does LiteLLM support?

LiteLLM's provider matrix spans hosted frontier APIs, hyperscaler model marketplaces, regional clouds, open-source inference servers, image-generation services and audio APIs. A non-exhaustive list, drawn from the official provider index, includes:[^24]

| Category | Representative providers |
|---|---|
| Hosted frontier LLMs | [OpenAI](/wiki/openai), [Anthropic](/wiki/anthropic), [xAI](/wiki/xai), [Cohere](/wiki/cohere), [AI21](/wiki/ai21_labs), [Mistral AI](/wiki/mistral_ai) |
| Hyperscaler marketplaces | [AWS Bedrock](/wiki/amazon_bedrock), AWS SageMaker, [Azure OpenAI](/wiki/azure_openai), Azure AI, [Google Vertex AI](/wiki/google_vertex_ai), Google AI Studio / [Gemini](/wiki/gemini) |
| Multi-tenant inference clouds | [Together AI](/wiki/together_ai), [Replicate](/wiki/replicate), [Fireworks AI](/wiki/fireworks_ai), DeepInfra, Groq, [Databricks](/wiki/databricks) |
| Open-source / self-hosted runtimes | [Ollama](/wiki/ollama), [vLLM](/wiki/vllm), LM Studio, Llamafile, Xinference, NVIDIA NIM |
| Image and audio APIs | Stability AI, Black Forest Labs (FLUX), Recraft, ElevenLabs, Deepgram |
| Aggregator and developer tools | [Hugging Face](/wiki/hugging_face), OVHCloud, Volcano Engine, DataRobot |

Any service that exposes an OpenAI-compatible HTTP interface can additionally be invoked simply by passing `openai/<model-name>` along with a custom `api_base`, which is the mechanism LiteLLM uses for community-run inference servers and OpenAI-compatible gateways.[^24]

## How does LiteLLM compare to OpenRouter, Portkey and Helicone?

LiteLLM is most often compared with three other components of the "AI gateway" layer: OpenRouter, Portkey and [Helicone](/wiki/helicone).[^25][^26]

[OpenRouter](/wiki/openrouter) is an API aggregator that fronts roughly 300 models from more than 60 providers and bills callers directly for usage (typically adding a 5% markup), making it appealing for prototypes and consumer apps that prefer not to manage individual provider accounts.[^25] LiteLLM, Portkey and Helicone instead sit between an organisation's application and the providers, using the organisation's own API keys; payment flows to each provider directly and the gateway is paid for separately (or self-hosted).[^25]

Portkey is a closed-core commercial gateway whose main differentiation is "production safety" features such as built-in guardrails, PII redaction, jailbreak detection and audit trails, starting at $49/month for managed use.[^25] Helicone is primarily an observability platform that also functions as a lightweight proxy; it is open source, written largely in Rust, and emphasises load-balancing performance and analytics.[^25][^26] LiteLLM occupies the "maximum customisation, self-hostable" position: the open-source core is free with unlimited self-hosted use (callers pay only their upstream providers and any infrastructure costs), and the proxy is more configurable than Portkey or OpenRouter at the cost of more setup time.[^25][^26]

| Tool | Model | Hosting | Pricing model |
|---|---|---|---|
| LiteLLM | Open source proxy + SDK | Self-hosted or LiteLLM Cloud | Free OSS; enterprise quoted |
| OpenRouter | Aggregator, single billing | Hosted | ~5% markup on traffic |
| Portkey | Commercial gateway | Hosted (and self-host option) | From $49/month |
| Helicone | Observability + lightweight proxy | Self-hosted or hosted | Open source; SaaS tiers |

LiteLLM is also adjacent to higher-level orchestration libraries like [LangChain](/wiki/langchain) and [LlamaIndex](/wiki/llamaindex), which typically call out to model providers through LiteLLM rather than competing with it, and to inference-server projects such as [vLLM](/wiki/vllm) and Hugging Face Text Generation Inference, which sit one layer below LiteLLM and supply the actual model serving.[^14][^24]

## Who uses LiteLLM?

LiteLLM markets itself primarily to two audiences: individual developers and small teams that want to keep a single code path while experimenting with multiple models, and platform teams that need to expose models to an entire engineering organisation with central governance.[^4][^23] On the corporate side, the LiteLLM homepage and Y Combinator profile cite Rocket Money, Samsara, Lemonade, Adobe and Netflix as production users, and the project's open-source adopter list also names Stripe, Greptile, OpenHands, the Google Agent Development Kit (ADK) and the OpenAI Agents SDK.[^6][^9] Netflix Staff Software Engineer David Leen is quoted saying that "LiteLLM has let my team provide the latest LLM models to our users usually within a day" of their release, and Lemonade Principal Architect (GenAI Platform) Mark Koltnuk has said that "our experience with LiteLLM and Langfuse at Lemonade has been outstanding."[^6][^9] InfoWorld additionally describes the project as offering "day-zero access to new models with minimal overhead" for organisations including Netflix, Lemonade and Rocket Money.[^7]

Common deployment patterns include using the SDK directly in a single application for portable provider selection, running the proxy as an internal AI gateway behind which all teams' applications make OpenAI-shaped calls (a pattern that yields organisation-wide cost tracking, central guardrails and central key management), and using the proxy as the back-end for tools that already speak the OpenAI protocol such as IDE assistants, RAG frameworks and agentic systems built with the [Model Context Protocol](/wiki/model_context_protocol).[^4][^7][^16]

A third common deployment pattern uses LiteLLM as a translation layer in front of self-hosted inference: a fleet of open-source models served by [vLLM](/wiki/vllm), Hugging Face Text Generation Inference, [Ollama](/wiki/ollama) or LM Studio is registered as deployments behind a LiteLLM Proxy, which then presents an OpenAI-compatible surface to applications. This decouples the choice of inference runtime from the application layer and allows organisations to mix hosted frontier models with self-hosted open-weight models behind the same set of virtual keys and budgets.[^24]

LiteLLM has been adopted as the model-routing layer in several agentic and developer-tooling stacks. The project's own README cites integrations with LangGraph and the Google Vertex AI Agent Engine, and notes Model Context Protocol support so that any LLM accessible through LiteLLM can call MCP tool servers.[^1] The combination of unified provider access, automatic fallbacks and centralised budgets makes the gateway particularly useful in long-running agent workflows, where transient provider outages or capacity exhaustion would otherwise cascade into broken agent traces.[^1][^15]

## What are the limitations and criticisms of LiteLLM?

Several limitations and risks have been documented in independent coverage and in the project's own changelog.

Comparative reviews note that LiteLLM's flexibility comes at a configuration cost; one 2026 comparison estimates 15 to 30 minutes of YAML configuration to stand up a production proxy versus under five minutes for OpenRouter or Portkey.[^25] Because LiteLLM normalises every provider to the OpenAI schema, provider-specific features that have no OpenAI analogue are exposed through ad-hoc pass-through fields, and behaviour can drift when providers add new response fields faster than mappings are updated.[^14]

### What was the March 2026 LiteLLM supply-chain attack?

A high-impact incident occurred in March 2026, when malicious versions 1.82.7 and 1.82.8 of the `litellm` package were published to PyPI through a compromised upstream maintainer account (the broader incident also affected the unrelated `trivy` CI/CD package).[^27][^28] The malicious release contained a `litellm_init.pth` file that executed a base64-encoded payload on Python import, performed credential exfiltration to an attacker-controlled endpoint, encrypted captured material with RSA, and attempted Kubernetes lateral movement and installation of a persistent systemd service.[^28][^29] An unintended fork bomb in the payload aided detection. PyPI quarantined the package roughly thirty minutes after the first vulnerability report, by which point the package had been live for about forty-six minutes; researcher Simon Willison subsequently relayed an estimate that approximately 46,996 downloads occurred during the exposure window across the two tainted versions, and noted that 88% of the 2,337 downstream packages that depend on LiteLLM did not pin a safe version range.[^27][^28] The maintainers published a post-mortem on Hacker News and rotated credentials and signing keys.[^28][^29] The incident is frequently cited as evidence of the importance of dependency pinning, SBOM hygiene and supply-chain controls for the LLMOps tool ecosystem rather than as a flaw in LiteLLM's runtime architecture per se.[^27][^29]

Beyond supply-chain risk, the gateway pattern itself introduces operational considerations. Every request to a LiteLLM-managed model adds one extra network hop; while the LiteLLM homepage claims an 8 millisecond P95 overhead at 1,000 requests per second, that figure depends on co-location with the proxy, careful tuning of Python workers and the use of Redis for shared state.[^1][^6] Self-hosted deployments are also responsible for the availability of the proxy itself: an outage of the LiteLLM Proxy will block traffic to every underlying provider, so operators typically run multiple replicas behind a load balancer and treat the Redis and PostgreSQL dependencies as production-critical components.[^4]

## ELI5: LiteLLM explained simply

Imagine every AI company (OpenAI, Google, Anthropic and dozens more) speaks a slightly different language, and you have to learn all of them to talk to all of them. LiteLLM is a universal translator: you speak just one language (OpenAI's), and LiteLLM quietly translates your message into whichever AI you want to use and translates the answer back. If you want to switch from one AI to another, you change a single word instead of rewriting your whole program. The bigger version, the LiteLLM Proxy, is like a front desk for a company: every team sends its AI requests to the same desk, which checks who is allowed in, keeps track of how much each team spends, and reroutes the request to a backup AI if the first one is down.

## Related work and comparison

- [OpenRouter](/wiki/openrouter): pay-per-token API aggregator that fronts 60+ providers with a single key and billing relationship.
- [Helicone](/wiki/helicone): Rust-based observability proxy that overlaps with LiteLLM's logging features.
- [LangChain](/wiki/langchain) and [LlamaIndex](/wiki/llamaindex): higher-level orchestration frameworks that frequently use LiteLLM as their model adapter.
- [vLLM](/wiki/vllm) and [Ollama](/wiki/ollama): inference servers and runtimes that LiteLLM proxies to.
- [Model Context Protocol](/wiki/model_context_protocol): Anthropic-originated protocol for connecting models to tools, supported by the LiteLLM Proxy as an MCP host.
- [Langfuse](/wiki/langfuse), [MLflow](/wiki/mlflow): observability platforms commonly wired to LiteLLM via callbacks.

## See also

- [OpenAI API](/wiki/openai_api)
- [OpenAI Responses API](/wiki/openai_responses_api)
- [Azure OpenAI Service](/wiki/azure_openai)
- [Amazon Bedrock](/wiki/amazon_bedrock)
- [Google Vertex AI](/wiki/google_vertex_ai)
- [Anthropic](/wiki/anthropic)
- [Cohere](/wiki/cohere)
- [Mistral AI](/wiki/mistral_ai)
- [Together AI](/wiki/together_ai)
- [Replicate](/wiki/replicate)
- [Fireworks AI](/wiki/fireworks_ai)
- [Hugging Face](/wiki/hugging_face)
- [Ollama](/wiki/ollama)
- [vLLM](/wiki/vllm)
- [LangChain](/wiki/langchain)
- [LlamaIndex](/wiki/llamaindex)
- [Helicone](/wiki/helicone)
- [Langfuse](/wiki/langfuse)
- [MLflow](/wiki/mlflow)
- [OpenRouter](/wiki/openrouter)
- [Model Context Protocol](/wiki/model_context_protocol)
- [MLOps](/wiki/mlops)
- [AI21 Labs](/wiki/ai21_labs)
- [Databricks](/wiki/databricks)
- [xAI](/wiki/xai)

## References

[^1]: BerriAI, "BerriAI/litellm: Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format", GitHub, 2026. https://github.com/BerriAI/litellm. Accessed 2026-06-25.
[^2]: Y Combinator, "LiteLLM: Call every LLM API like it's OpenAI [100+ LLMs]", Y Combinator company directory, 2026. https://www.ycombinator.com/companies/litellm. Accessed 2026-06-25.
[^3]: LiteLLM documentation, "Getting Started", docs.litellm.ai, 2026. https://docs.litellm.ai/. Accessed 2026-06-25.
[^4]: LiteLLM documentation, "LiteLLM Proxy (LLM Gateway)", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/simple_proxy. Accessed 2026-06-25.
[^5]: LiteLLM documentation, "Callbacks", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/observability/callbacks. Accessed 2026-06-25.
[^6]: BerriAI, "LiteLLM AI Gateway", litellm.ai, 2026. https://www.litellm.ai/. Accessed 2026-06-25.
[^7]: Janakiram MSV, "LiteLLM: An open-source gateway for unified LLM access", InfoWorld, 2025-05-15. https://www.infoworld.com/article/3975290/litellm-an-open-source-gateway-for-unified-llm-access.html. Accessed 2026-06-25.
[^8]: Digger Insights, "Berri AI's LiteLLM Simplifies LLM API Integration", Medium, 2024. https://medium.com/@DiggerInsights/berri-ais-litellm-simplifies-llm-api-integration-8e1d368a0403. Accessed 2026-06-25.
[^9]: Y Combinator, "LiteLLM (YC W23) company profile", Y Combinator, 2026. https://www.ycombinator.com/companies/litellm. Accessed 2026-06-25.
[^10]: Y Combinator (official), "Launch YC: LiteLLM, Call every LLM API like it's OpenAI", Twitter/X, 2023-08-24. https://x.com/ycombinator/status/1694726256200151040. Accessed 2026-06-25.
[^11]: CO/AI, "Job alert: Open-source LiteLLM raises $1.6M, now hiring founding engineer", getcoai.com, 2024. https://getcoai.com/news/job-alert-open-source-litellm-raises-1-6m-now-hiring-founding-engineer/. Accessed 2026-06-25.
[^12]: BerriAI, "litellm/litellm Docker image", Docker Hub, 2026. https://hub.docker.com/r/litellm/litellm. Accessed 2026-06-25.
[^13]: LiteLLM documentation, "SDK Quickstart", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/learn/sdk_quickstart. Accessed 2026-06-25.
[^14]: DataCamp, "LiteLLM: A Guide With Practical Examples", DataCamp tutorials, 2024. https://www.datacamp.com/tutorial/litellm. Accessed 2026-06-25.
[^15]: LiteLLM documentation, "Router - Load Balancing", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/routing. Accessed 2026-06-25.
[^16]: LiteLLM documentation, "Langchain, OpenAI SDK, LlamaIndex, Instructor, Curl examples", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/proxy/user_keys. Accessed 2026-06-25.
[^17]: LiteLLM documentation, "Enterprise", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/enterprise. Accessed 2026-06-25.
[^18]: LiteLLM documentation, "Fallbacks", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/proxy/reliability. Accessed 2026-06-25.
[^19]: LiteLLM documentation, "Virtual Keys", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/proxy/virtual_keys. Accessed 2026-06-25.
[^20]: LiteLLM documentation, "Spend Tracking", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/proxy/cost_tracking. Accessed 2026-06-25.
[^21]: LiteLLM documentation, "Enterprise Features", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/proxy/enterprise. Accessed 2026-06-25.
[^22]: LiteLLM documentation, "Observability Callbacks", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/observability/callbacks. Accessed 2026-06-25.
[^23]: TrueFoundry, "Understanding LiteLLM Pricing: Cost of Open Source Gateways", truefoundry.com, 2025. https://www.truefoundry.com/blog/litellm-pricing-guide. Accessed 2026-06-25.
[^24]: LiteLLM documentation, "Providers", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/providers. Accessed 2026-06-25.
[^25]: PkgPulse, "Portkey vs LiteLLM vs OpenRouter: LLM Gateway 2026", pkgpulse.com, 2026. https://www.pkgpulse.com/guides/portkey-vs-litellm-vs-openrouter-llm-gateway-2026. Accessed 2026-06-25.
[^26]: Helicone, "Top LLM Gateways: The Complete Guide to Choosing the Best AI Gateway", helicone.ai blog, 2025. https://www.helicone.ai/blog/top-llm-gateways-comparison-2025. Accessed 2026-06-25.
[^27]: Simon Willison, "LiteLLM Hack: Were You One of the 47,000?", simonwillison.net, 2026-03-25. https://simonwillison.net/2026/Mar/25/litellm-hack/. Accessed 2026-06-25.
[^28]: Hacker News commentary thread, "My minute-by-minute response to the LiteLLM malware attack", news.ycombinator.com, 2026. https://news.ycombinator.com/item?id=47531967. Accessed 2026-06-25.
[^29]: Hacker News, "Malicious litellm_init.pth in litellm 1.82.8 PyPI package, credential stealer", news.ycombinator.com, 2026. https://news.ycombinator.com/item?id=47501729. Accessed 2026-06-25.
[^30]: BerriAI, "Releases - BerriAI/litellm", GitHub, 2026. https://github.com/BerriAI/litellm/releases. Accessed 2026-06-25.
[^31]: LiteLLM documentation, "MCP Gateway", docs.litellm.ai, 2026. https://docs.litellm.ai/docs/mcp. Accessed 2026-06-25.