Helicone is an open-source LLM observability platform and AI gateway founded in 2023 by Justin Torre and Cole Gottdank. The company graduated from Y Combinator's Winter 2023 (W23) batch and was headquartered in San Francisco, California. By the time of its acquisition in early 2026, Helicone had grown to serve more than 16,000 organizations and had processed over 14.2 trillion tokens across its three years of operation. The platform is best known for its proxy-based integration model, which allows developers to add observability to existing LLM applications by changing a single line of code, and for the Helicone AI Gateway, a unified routing layer providing access to more than 100 AI models through a single OpenAI-compatible endpoint.
In March 2026, Mintlify announced the acquisition of Helicone. Justin Torre and Cole Gottdank joined Mintlify in San Francisco as part of the deal. The Helicone product entered maintenance mode following the acquisition, with security updates and bug fixes continuing while existing customers were assisted in migrating to alternative platforms.
Helicone was founded in 2023 by Justin Torre and Cole Gottdank. The two founders applied to Y Combinator during a period when large language model applications were proliferating rapidly. As Torre and Gottdank later described, every company in their YC batch appeared to be building a "GPT wrapper" -- an application layered on top of OpenAI's API -- yet almost no tooling existed to observe, debug, or monitor what those applications were actually doing in production.
Justin Torre had previously worked as a developer evangelist and educator at Apple. Cole Gottdank brought a background in software engineering and product development. Together, they identified a structural gap: developers building LLM-powered products had no reliable way to inspect the requests and responses flowing through their applications, track token usage and costs per user, identify latency bottlenecks, or catch prompt regressions before they reached production. Existing application performance monitoring tools were not designed for the probabilistic, high-volume, multi-provider nature of LLM inference.
Helicone's initial solution was deliberately minimal: change your OpenAI base URL from api.openai.com to oai.helicone.ai, add your Helicone API key as a header, and every LLM request would be automatically logged, metered, and displayed on a dashboard. This proxy-first approach required no SDK installation, no instrumentation changes, and no refactoring of existing code.
The company was accepted into Y Combinator's Winter 2023 cohort and graduated from the program in March 2023. YC provided initial funding of $500,000 in standard pre-seed investment as part of its standard deal structure.
Following YC Demo Day, Helicone raised additional seed funding. According to multiple financial data sources, the company raised a seed round that brought total funding to approximately $5 million, at a reported valuation of $25 million. Investors included Y Combinator, Village Global, FundersClub, and Coughdrop Capital, among others. Some sources report a pre-seed round of approximately $1.5 million in April 2023 as a distinct event before the larger seed close.
The company hit $1 million in annualized revenue in June 2024, achieving this milestone with a team of approximately five people -- a ratio that attracted attention in the developer tools community as an example of a highly capital-efficient SaaS business.
Through 2023 and 2024, Helicone expanded well beyond its original request-logging use case. The platform added support for sessions and multi-step trace visualization, prompt management with versioning and rollback, cost analytics broken down by model and user segment, custom evaluations, and an experiments feature for structured A/B testing of prompt variations. The company also introduced native support for providers beyond OpenAI, including Anthropic, Google (Gemini), Azure OpenAI, Mistral, Together AI, and others.
In June 2025, Helicone launched the Helicone AI Gateway in public beta. Built in Rust for performance, the gateway extended the platform's reach from observability into intelligent routing, providing access to more than 100 LLM providers through a single endpoint, with features including automatic failover, response caching, load balancing, and rate limiting.
On March 3, 2026, Mintlify announced that it was acquiring Helicone. Mintlify, a documentation platform founded around the same time as Helicone and also YC-backed, had been using Helicone's gateway and observability infrastructure internally: as the founders noted in their public announcement, Helicone's "gateway was routing their requests" and "observability was keeping things fast and accurate" before any acquisition discussions began.
The strategic rationale centered on Mintlify's ambition to build what it described as AI knowledge infrastructure -- systems where AI agents autonomously maintain, update, and evolve the documentation and knowledge bases that companies depend on. For those agentic workflows, intelligent routing, real-time observability, and multi-provider coordination capabilities were foundational requirements. Helicone's three years of production data and infrastructure across 16,000 organizations gave Mintlify an immediate foundation.
The financial terms of the acquisition were not disclosed. Following the close, Helicone's product entered maintenance mode. The open-source repository remained available under its Apache 2.0 license, and security patches and bug fixes continued to ship. Existing customers were supported in migrating to alternative platforms.
Helicone is released under the Apache License, Version 2.0. This permissive license allows users to use, modify, and distribute the software, including for commercial purposes, without requiring derivative works to be open-sourced. The license grants a patent license from contributors to users and requires preservation of copyright and license notices.
The primary repository, located at github.com/Helicone/helicone, accumulated approximately 5,600 GitHub stars by mid-2025. A separate repository, github.com/Helicone/ai-gateway, hosts the standalone AI Gateway component. The Apache 2.0 license was a deliberate choice: as the founding team stated when introducing the AI Gateway, the license was chosen to provide "business model transparency" and to encourage enterprise adoption without concerns about license compatibility.
The platform can be fully self-hosted. Community contributions are accepted through standard pull request workflows, and the platform maintains an active Discord community. The codebase is primarily written in TypeScript (approximately 91% of the repository), with smaller portions in MDX for documentation, Python for integrations, and PL/pgSQL for database functions.
Helicone's core integration model places a proxy between the application and the LLM provider. When a developer changes their API base URL to point at Helicone's proxy endpoint, all requests pass through Helicone's infrastructure before being forwarded to the underlying provider. The proxy captures the full request payload, timing information, response, token counts, and any custom metadata headers before logging this data asynchronously.
For OpenAI, the integration requires changing the base URL from https://api.openai.com/v1 to https://oai.helicone.ai/v1 and adding a Helicone-Auth header containing the developer's Helicone API key. Analogous endpoints exist for Anthropic, Azure OpenAI, and other supported providers. The change is backward compatible: if the Helicone service is unavailable, requests can be routed directly to the provider without code changes.
The proxy layer is deployed on Cloudflare Workers, Cloudflare's edge computing platform, which provides global points of presence to minimize geographic latency. Helicone reports an average proxy overhead of 50 to 80 milliseconds. This overhead is real and represents the primary performance trade-off of the proxy architecture. For latency-sensitive applications such as real-time voice interfaces or sub-second interactive experiences, the overhead may be unacceptable, and Helicone offers its asynchronous logging path for such use cases.
For use cases where adding Helicone to the request path is not feasible, the platform provides an asynchronous logging integration through OpenLLMetry, an open-source library that extends OpenTelemetry to cover LLM provider calls. With async logging, the application sends requests directly to the LLM provider and posts log events to Helicone's ingest endpoint separately, outside the critical path.
This approach guarantees that any Helicone infrastructure issue cannot cause application outages. The trade-off is that logs may arrive with slight delays and that streaming responses can be harder to capture in full. The async path also integrates with LiteLLM, allowing teams using LiteLLM's provider abstraction layer to export traces to Helicone without changing their request routing.
Helicone's data pipeline is built around Cloudflare Workers for edge request interception, Apache Kafka for high-throughput event streaming, and ClickHouse as the primary analytics database. ClickHouse, a column-oriented database optimized for online analytical processing, allows Helicone to run fast aggregation queries over billions of logged requests -- computing per-user token costs, latency percentiles, error rates, and other derived metrics in real time.
Object storage (Amazon S3 or Minio in self-hosted deployments) holds full request and response payloads. Supabase provides the application database and authentication layer. The web frontend is built with Next.js. The Jawn service (an Express/Tsoa server) acts as the primary log collection and API backend.
Helicone supports self-hosting through Docker Compose for development and small production deployments, and through Helm charts for Kubernetes environments requiring enterprise-scale horizontal scaling. In a self-hosted deployment, the Cloudflare Workers component is replaced by a local proxy service, and developers connect their own ClickHouse and Kafka instances. Helm chart support allows operators to connect to managed cloud databases such as AWS Aurora instead of running these services in-cluster.
Self-hosting provides complete data residency: no request payloads or metadata leave the organization's infrastructure. This is valuable for companies operating under strict data governance requirements, including those in healthcare (HIPAA) and finance.
Helicone logs every LLM request and response passing through the proxy, capturing the full prompt, completion, model identifier, token counts (prompt tokens, completion tokens, total), latency, provider, HTTP status, and any custom metadata headers. The dashboard aggregates this data into time-series charts of request volume, token usage, cost, and latency, with filtering by model, user, time range, and custom properties.
The Helicone Query Language (HQL), available on the Pro tier and above, allows developers to write custom filter expressions against the request log. This is useful for building dashboards tracking specific prompt templates, user cohorts, or feature flags.
Developers can attach arbitrary key-value metadata to requests by adding headers in the format Helicone-Property-[Name]: [value]. For example, a multi-tenant SaaS product might attach the customer tier, the application feature, and the A/B test variant to each request. Custom properties appear as filterable dimensions in the Helicone dashboard, enabling per-feature cost attribution, per-customer billing reconciliation, and segmented performance analysis without requiring any changes to the LLM provider API calls beyond additional headers.
The sessions feature groups related LLM calls that belong to a single logical workflow. Developers assign a Helicone-Session-Id header (a UUID or other unique identifier), a Helicone-Session-Path indicating where in a workflow tree the call falls, and a Helicone-Session-Name for labeling. Helicone then reconstructs the full call graph for each session.
The sessions dashboard offers three views: Chat view, which displays conversations as a user-facing message thread; Tree view, which shows the hierarchical structure of nested LLM calls and tool calls; and Span view, which presents a timeline of calls in the style of distributed tracing tools such as Jaeger or Zipkin. This feature is particularly useful for debugging multi-step agent workflows, retrieval-augmented generation pipelines, and chains involving multiple model calls.
Helicone's prompt management system provides a centralized registry for storing, versioning, and deploying prompts. Developers associate a request with a named prompt by adding a Helicone-Prompt-Id header. Helicone uses the hpf (Helicone Prompt Format) helper to identify dynamic input variables within the prompt, which are specified with the syntax {{hc:variableName:type}}.
Each time a prompt changes, Helicone automatically creates a new version with a clear audit trail. Operators can compare versions side by side, roll back to a previous version, and promote specific versions to production, staging, or development environments without requiring application redeployment. The system also accumulates an input/output dataset for each prompt version, which can be used for evaluation and fine-tuning.
Prompt Partials allow common sub-prompts -- system instructions, persona definitions, output format specifications -- to be defined once and referenced across multiple prompts, reducing duplication and providing a single point of change for shared components.
Helicone provides an evaluation framework that allows teams to score LLM responses against defined criteria. Evaluations can be applied to production request logs or to datasets collected from prompt management. The platform supports both manual scoring through the dashboard and automated evaluation using LLM-as-judge approaches, where a separate model evaluates response quality according to a rubric.
Custom evaluators can assess dimensions such as faithfulness, relevance, tone adherence, format compliance, and safety. The Experiments feature, introduced in late 2024, provides a spreadsheet-like interface for running structured prompt experiments: users define input variable sets, specify prompt variants to compare, and view results side by side across model and prompt combinations.
Helicone implements semantic and exact-match response caching at the proxy layer using Cloudflare Workers' edge storage. When caching is enabled via the Helicone-Cache-Enabled: true header, Helicone stores LLM responses and returns cached results for identical or semantically similar subsequent requests. Cache duration is configurable via standard Cache-Control headers, and bucket sizing controls the granularity of semantic matching.
Caching can meaningfully reduce API costs for applications with repetitive query patterns -- for example, documentation search tools, FAQ bots, or applications with high query overlap. Users have reported cost reductions of 20 to 30 percent for appropriate workloads. Cached responses are served with sub-millisecond latency from Cloudflare edge nodes, compared to typical LLM inference times of hundreds of milliseconds to several seconds.
Helicone includes several LLM security capabilities. Prompt injection detection attempts to identify inputs designed to override system instructions. Key management in the gateway mode allows teams to store and rotate provider API keys centrally without distributing them to individual applications. Rate limiting is configurable at the user, organization, or key level to prevent abuse. The platform is SOC 2 Type II certified and GDPR compliant on the Team and Enterprise tiers.
The Pro tier and above include configurable alerting. Operators can set thresholds on cost per time window, error rates, latency, and request volume. Alerts can be delivered via email or webhook. The dashboard includes a dedicated reporting view for generating cost and usage summaries over arbitrary time ranges.
The Helicone AI Gateway was launched in public beta on June 24, 2025, as a standalone product available at ai-gateway.helicone.ai. It is also available as a separately licensed open-source repository (github.com/Helicone/ai-gateway) built in Rust for performance.
The gateway provides a single OpenAI-compatible API endpoint that routes requests to more than 100 LLM providers. Developers point their existing OpenAI SDK client at https://ai-gateway.helicone.ai instead of https://api.openai.com/v1 and specify any supported model identifier in the model field. The gateway handles authentication, request translation, and routing to the appropriate upstream provider. The key insight motivating the product was that more than 90 percent of Helicone's users were operating five or more LLMs simultaneously in production, each requiring separate SDKs, authentication schemes, and rate limit management.
Supported providers include OpenAI (GPT-4o and GPT-4 series), Anthropic (Claude 3.5 and Claude 3 series), Google (Gemini 1.5 and Gemini 2.0), Mistral, Meta Llama (via Together AI and Groq), Cohere, Perplexity, AWS Bedrock, Azure OpenAI, and many others. The gateway is compatible with routing to providers such as OpenRouter, which itself aggregates a large set of open and closed models.
Automatic failover: The gateway monitors provider health and automatically reroutes requests to backup providers when primary providers return errors or exceed latency thresholds. This enables high availability for LLM applications without manual circuit breaker implementation.
Load balancing: Request routing supports latency-based, cost-based, and custom weighting strategies. Teams can define fallback chains specifying ordered lists of providers and models, allowing the gateway to find the cheapest available option meeting a latency budget.
Response caching: Built-in caching at the gateway layer reduces redundant provider calls and costs.
Rate limiting: Per-user, per-team, and per-key rate limits can be enforced at the gateway without application-level changes.
Prompt management integration: Prompts stored in Helicone's prompt registry can be referenced by ID in gateway requests, with the gateway substituting variables and retrieving the appropriate version at inference time.
Unified observability: All requests routed through the AI Gateway are automatically logged to Helicone's observability stack, with no additional integration required.
Zero markup pricing: When using Helicone credits to access providers, the gateway charges the underlying provider's list price plus Stripe payment processing fees, with no additional markup. This contrasts with some competing gateway services that charge a percentage markup on provider costs.
The gateway can be self-hosted using its standalone Rust binary or via Docker. The separation of the AI Gateway into its own repository reflects a design decision to allow teams to use the routing and reliability infrastructure independently of the broader Helicone observability platform.
Helicone offers four pricing tiers for its hosted cloud service:
| Tier | Monthly Price | Requests Included | Data Retention | Seats | Notable Features |
|---|---|---|---|---|---|
| Hobby | Free | 10,000/month | 7 days | 1 | Core logging, dashboard, 1 GB storage |
| Pro | $79/month | 10,000 + usage-based | 1 month | Unlimited | HQL, alerts, reports, 1,000 logs/min ingestion |
| Team | $799/month | 10,000 + usage-based | 3 months | Unlimited | 5 orgs, SOC 2/HIPAA, 15,000 logs/min, Slack support |
| Enterprise | Custom | Custom | Unlimited | Unlimited | On-prem, SAML SSO, custom MSA, 30,000 logs/min |
Usage-based charges apply to requests beyond the included allotment and to storage beyond the free tier. An illustrative example from Helicone's pricing page showed a cost of approximately $0.97 per month for 10,000 requests plus 0.30 GB of storage.
Helicone offers discount programs for qualifying organizations:
Self-hosting is free for all tiers, subject only to infrastructure costs. The open-source license imposes no usage or seat restrictions.
The LLM observability market includes several other significant tools. The primary competitors are LangSmith, Langfuse, and Arize Phoenix.
| Feature | Helicone | LangSmith | Langfuse | Arize Phoenix |
|---|---|---|---|---|
| Open-source | Yes (Apache 2.0) | No | Yes (Apache 2.0) | Yes (Apache 2.0) |
| Integration method | Proxy or async SDK | SDK only | SDK or OpenTelemetry | SDK |
| Self-hosting | Yes (Docker/Helm) | Enterprise only | Yes | No |
| LLM framework focus | Model-layer (provider-agnostic) | LangChain/LangGraph | Framework-agnostic | Framework-agnostic |
| Response caching | Yes (built-in) | No | No | No |
| AI Gateway | Yes (100+ models) | No | No | No |
| Cost tracking | Advanced | Basic | Basic | Basic |
| Prompt management | Yes | Yes | Yes | Limited |
| Evaluation | Basic (scores, experiments) | Advanced | Basic | Advanced |
| Sessions/traces | Yes | Yes | Yes | Yes |
| OpenTelemetry native | Via OpenLLMetry | No | Yes | Yes |
| Free tier | 10,000 requests/month | 5,000 traces/month | 50,000 events/month | Fully free |
| Paid pricing | From $79/month | ~$39/user/month | Volume-based | Arize Cloud pricing |
| SOC 2 certified | Yes (Team+) | Yes | Yes | Via Arize |
Helicone vs LangSmith: LangSmith is developed by LangChain's maintainers and is optimized for applications built with the LangChain and LangGraph frameworks. It provides automatic, zero-configuration tracing for LangChain workflows and more sophisticated evaluation tooling than Helicone. However, it is not open source, does not offer self-hosting below enterprise pricing, lacks response caching, and has no AI gateway equivalent. Teams heavily invested in the LangChain ecosystem often prefer LangSmith for its tight framework integration; teams using multiple providers or frameworks often find Helicone's proxy model simpler.
Helicone vs Langfuse: Langfuse is also open-source (Apache 2.0) and supports self-hosting. It natively speaks OpenTelemetry, making it easier to integrate with existing observability stacks. Langfuse's free cloud tier is more generous (50,000 events per month versus Helicone's 10,000 requests). Langfuse lacks built-in response caching and an AI gateway. Helicone's proxy integration requires fewer code changes for OpenAI-based applications, while Langfuse's SDK-based instrumentation is more flexible for complex multi-service traces.
Helicone vs Arize Phoenix: Arize Phoenix is developed by Arize AI and focuses heavily on evaluation and model quality assessment. It is fully open-source with no paid version or feature gating, making it attractive for teams prioritizing evaluation depth over operational features. Phoenix does not offer self-hosting as a supported deployment path and has less focus on cost tracking and response caching. It is particularly strong for retrieval-augmented generation debugging, with specialized views for retrieval quality and document relevance.
Helicone published a number of formal case studies and received testimonials from organizations across diverse industries:
Community testimonials came from a wide range of smaller companies including Greptile (code search and AI coding), CodeCrafters (developer education), Reworkd (autonomous AI agents), Haema (healthcare), Chatwith (customer support chatbots), and assistant-ui (AI chat component library).
Cost attribution and billing: Multi-tenant SaaS products use custom properties to track token consumption per customer, enabling accurate cost attribution and per-seat billing reconciliation.
Debugging production failures: When LLM applications produce unexpected outputs or errors in production, Helicone's request logs allow developers to replay exact prompts, inspect full responses, and identify whether the failure originated in the prompt, the model, or downstream processing.
Prompt regression detection: By versioning prompts and tracking output quality metrics over time, teams can detect when a prompt change has degraded response quality before it affects a significant portion of users.
Multi-provider cost optimization: The AI Gateway's zero-markup pricing and intelligent routing enable teams to route requests to the cheapest available provider meeting their quality and latency requirements, without application-level changes.
Compliance and audit logging: Organizations in regulated industries use self-hosted Helicone to maintain a complete, immutable audit log of all LLM requests and responses within their own infrastructure.
AI agent tracing: Development teams building agentic applications with multi-step tool-use workflows use Helicone's sessions feature to trace the full execution graph of agent runs, identifying where reasoning errors or tool call failures occur.
Several limitations of Helicone's platform have been identified by users and analysts:
Proxy latency overhead: The proxy-based integration model adds 50 to 80 milliseconds of average latency to every LLM request. For latency-sensitive applications -- real-time voice synthesis, sub-second interactive experiences, or streaming chat with tight first-token-time budgets -- this overhead may be unacceptable. The async logging path avoids this cost but requires more complex integration.
Shallow evaluation capabilities: Helicone's evaluation features are considered less mature than those of LangSmith or Arize Phoenix. The platform offers numeric scoring and custom evaluators but lacks sophisticated built-in evaluation frameworks for tasks such as automated hallucination detection, factuality assessment at scale, or structured evaluation harness management.
Limited issue tracking: Helicone logs failed or problematic requests but does not provide a native issue-tracking workflow. When a request fails or produces a bad output, users must manually correlate logs and sessions to diagnose the root cause; there is no built-in concept of an issue with lifecycle state, assignment, or resolution tracking.
Agent observability depth: While sessions and tree tracing provide visibility into multi-step workflows, Helicone was not designed with complex agent observability as a primary concern. Very long agent runs spanning dozens of steps across multiple models, tool calls, and memory operations can be difficult to debug even with sessions enabled.
Single-provider simplicity assumption: Helicone's proxy model is most straightforward for single-provider OpenAI applications. Teams with heterogeneous stacks involving custom inference servers, private model deployments, or complex LangChain graphs may find SDK-based tools like Langfuse or LangSmith easier to instrument comprehensively.
Discontinuation risk: Following the March 2026 acquisition by Mintlify, the Helicone cloud product entered maintenance mode. While the open-source code remains available and self-hosting is fully supported, new feature development on the SaaS product has stopped, and cloud customers must migrate to alternative platforms.