Langfuse

AI Companies Developer Tools MLOps Model Evaluation Open Source AI

31 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

24 citations

Revision

v4 · 6,288 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Langfuse is an open-source LLM engineering platform that provides observability, tracing, prompt management, evaluation, and dataset tooling for applications built on large language models.^[1] It helps development teams debug, monitor, and improve LLM-powered systems in production, covering the full lifecycle from experimentation through ongoing quality management. The core platform is released under the permissive MIT license and is offered as both a self-hostable open-source project and a managed cloud service, a combination that has made Langfuse one of the most widely deployed tools in the emerging field of LLMOps and AI observability.^[1] Built on OpenTelemetry standards, it integrates with more than fifty LLM frameworks including LangChain, the OpenAI SDK, LlamaIndex, LiteLLM, and CrewAI.^[1]

Langfuse was founded in 2022 by Marc Klingen, Maximilian Deichmann, and Clemens Rawert and graduated from Y Combinator's Winter 2023 (W23) batch.^[2]^[13] The company raised a $4 million seed round in November 2023 led by Lightspeed Venture Partners.^[4] On January 16, 2026, the data warehouse company ClickHouse acquired Langfuse, announced alongside a $400 million Series D financing round led by Dragoneer Investment Group.^[7]^[8] As of June 2026 the open-source repository had accumulated roughly 29,600 GitHub stars, and Langfuse reported more than 50 million SDK installs per month and over 6 million Docker pulls, with adoption among 19 of the Fortune 50 and 63 of the Fortune 500 companies.^[1]^[2]

The MIT-licensed core covers tracing, evaluation, prompt management, datasets, and the full product interface, with an optional enterprise edition that adds security and compliance features.^[1] Langfuse Cloud provides a fully managed hosted service; self-hosted deployments are supported on Docker Compose, Kubernetes with Helm, and the major cloud providers via Terraform templates.^[16] Both deployment paths run the same codebase.

History

Who founded Langfuse, and when?

Langfuse was founded in 2022 by three German entrepreneurs: Marc Klingen, who serves as CEO; Maximilian Deichmann, who serves as CTO; and Clemens Rawert, who serves as COO.^[2] The founding team had previously worked together and shared a background in enterprise software and technology. The company's primary engineering office is in Berlin, Germany, with a secondary office in San Francisco focused on marketing and sales.^[2]

The founding predated the public emergence of the LLM application development wave. The team initially built Finto, a usage-based billing product that addressed problems from their previous roles.^[3] Though the product attracted interest from potential customers, the founders recognized they were not personally motivated by building what they described as the "nth generation of product in an established small market."^[3]

Applied to Y Combinator's Winter 2023 batch in November 2022, the founders relocated to San Francisco for the program.^[3] During their time at YC they experimented with a series of AI application ideas, including educational platforms, sales enrichment tools, and collaborative chat systems.^[3] Building these prototypes using early large language models exposed them directly to a problem that would define the company: LLM applications were extraordinarily difficult to debug, evaluate, and improve once deployed beyond a prototype stage.

The founders observed that while many teams could build compelling LLM prototypes quickly, moving those prototypes to production reliably required visibility and tools that did not yet exist. They attempted first to build a testing framework to improve confidence in LLM outputs, but found it insufficient because test data rarely reflected the variety of real production usage.^[3] This led them to focus on logging, tracing, and analytics as the foundational layer, building Langfuse rather than the applications it would eventually support.^[3]

Y Combinator W23 and early development

Langfuse went through Y Combinator's Winter 2023 batch as an LLM observability and analytics tool.^[13] The company's YC application described building product analytics for LLM-based applications.^[13] During the batch the founders moved from building applications to building infrastructure, positioning themselves as the developer tooling layer for LLM production deployments.^[14]

The initial version of Langfuse launched publicly in August 2023.^[3] The launch won Product Hunt's Product of the Day recognition, which generated early developer attention.^[3] The team open-sourced the project at launch, a decision driven by the observation that open source was the most effective distribution channel for developer infrastructure, particularly in a space where data sensitivity made engineers reluctant to send production telemetry to an unfamiliar closed vendor.

By September 2023 the project had reached 1,000 GitHub stars.^[3] The rapid growth in community engagement prompted the team to begin their fundraising process.^[14]

When did Langfuse raise its seed round?

In November 2023, Langfuse announced a $4 million seed round.^[4] The round was led by Lightspeed Venture Partners, with participation from La Famiglia and Y Combinator.^[4] The announcement came roughly three months after the public launch and positioned Langfuse as the emerging open-source standard for LLM observability.

Lightspeed Venture Partners published a blog post explaining the investment thesis, framing Langfuse as addressing a critical gap in the LLM developer toolchain.^[5] The firm argued that observability was a foundational requirement for production AI systems, just as it had become foundational for traditional software engineering, and that open source was the right architectural choice for infrastructure that handles sensitive production data.^[5]

The seed funding enabled the team to expand headcount. Founding engineers Marlies Mayerhofer and Hassieb Pakzad joined in February 2024.^[3] The company also accelerated product development, launching Prompt Management features in January 2024 and rebranding the platform from "LLM Observability" to "LLM Engineering Platform" during its first Launch Week in April 2024.^[3]

Growth and product milestones (2024-2025)

Through 2024 and 2025 Langfuse expanded its feature set and adoption significantly. The project reached 10,000 GitHub stars in mid-2024, crossed 20,000 stars by late 2024, and reached roughly 29,600 by June 2026.^[1] Monthly SDK installs grew from zero to more than 23 million by the time of the ClickHouse acquisition announcement in January 2026 and to more than 50 million by mid-2026.^[1]^[2]^[7]

Key product milestones during this period include the launch of the Datasets feature for structured evaluation and benchmarking, the introduction of LLM-as-a-judge automated evaluators, native support for agent workflow visualization, and expanded integrations with the growing ecosystem of LLM frameworks. The platform also released a major version 3 architecture that introduced native ClickHouse OLAP storage for improved analytical query performance at scale.^[21]

By March 2025, Langfuse reported more than a thousand self-hosted production deployments running ClickHouse, with some of the largest enterprise deployments ingesting billions of rows of tracing data.^[21] The platform earned SOC 2 Type II and ISO 27001 certifications, enabling enterprise procurement in regulated industries.^[24]

Why did ClickHouse acquire Langfuse?

On January 16, 2026, ClickHouse announced the close of a $400 million Series D financing round led by Dragoneer Investment Group, simultaneously announcing the acquisition of Langfuse.^[7]^[8] The round also included participation from Bessemer Venture Partners, GIC, Index Ventures, Khosla Ventures, Lightspeed Venture Partners, T. Rowe Price Associates, and WCM Investment Management.^[8] The financial terms of the acquisition were not disclosed, though the financing valued ClickHouse at approximately $15 billion.^[8]^[9] At the time of the deal, ClickHouse cited Langfuse's roughly 20,000 GitHub stars, more than 23 million monthly SDK installs, and over 6 million Docker pulls.^[7]

The strategic rationale rested on deep operational and technical alignment between the two companies. Langfuse had selected ClickHouse as its primary analytical database when rebuilding its storage architecture for version 3, meaning Langfuse already ran entirely on ClickHouse infrastructure.^[21] Reciprocally, ClickHouse used Langfuse internally to monitor and optimize its own agentic products.^[7] The companies shared a significant overlap in enterprise customers and open-source community deployments.^[7]

ClickHouse framed the acquisition as creating a combined open-source stack for building and monitoring AI applications: Langfuse for LLM quality monitoring and engineering tooling layered on top of ClickHouse's analytical infrastructure.^[7] Aaron Katz, co-founder and CEO of ClickHouse, announced the deal as the company "entering LLM observability with the acquisition of Langfuse," describing it as part of a strategy to unify real-time analytics with AI observability.^[7]^[8]

The co-founders described their reasoning for accepting the acquisition in the announcement blog post, writing: "Joining ClickHouse is our way of honoring that trust by putting more resources behind the thing we care about most: building a product you can rely on."^[6] They framed the move as a route to accelerated investment in platform reliability and performance rather than a conventional venture exit, giving them access to ClickHouse's database engineering expertise, enterprise sales infrastructure, and broader customer base without requiring another round of independent fundraising.^[6]

For existing users, the acquisition made no immediate changes. Langfuse stated that no changes to its open-source MIT licensing were planned, that Langfuse Cloud would continue with the same product, endpoints, and experience, and that self-hosting would remain a first-class supported path.^[6] ClickHouse committed to deepening integrations between the two products, making LLM observability a native capability within the broader ClickHouse analytical stack.^[7]

TechCrunch, SiliconANGLE, and InfoWorld covered the acquisition.^[10]^[11] Hacker News discussion of the announcement was extensive, with particular community interest in whether the open-source commitments would be maintained post-acquisition.^[12] Langfuse's public response emphasized that the MIT license for core features and the self-hosting option would continue.^[6]

Is Langfuse open source?

Langfuse's core features are released under the MIT license with no usage limits.^[1] The MIT-licensed codebase covers tracing, evaluations, prompt management, experiments, annotation, the playground, and all other product features visible in the Langfuse interface.^[1]

The repository follows a structure common in open-source commercial projects: advanced security and compliance modules are housed in a clearly marked /ee (enterprise edition) directory and require a commercial license key when used in self-hosted deployments.^[1] Enterprise edition features include SCIM directory synchronization for user provisioning, audit logging, and data retention policy management.^[24] The /ee boundary is architecturally clean, allowing the community version to run without any enterprise code activated.

This approach is sometimes described as open core: the product is genuinely open source for all functional capabilities, while operational features valued primarily by large enterprises are commercialized. Langfuse's own documentation states that the open-source version imposes no artificial feature caps; all tracing, evaluation, and prompt management capabilities are available to self-hosted deployments without restriction.^[1]

Users can switch between the open-source self-hosted version, the enterprise self-hosted version, and Langfuse Cloud at any time. All three deployment paths run the same codebase and database schema, so data portability is not a concern when migrating between deployment models.^[16]

The ClickHouse acquisition included an explicit commitment to preserve the MIT license for core features.^[6] ClickHouse published a statement confirming that open-source development would continue under the same terms, a position the founders reinforced in their own announcement.^[6]^[7]

Architecture

System components

Langfuse's production architecture consists of several primary components deployed together:^[15]

Langfuse Web: A Next.js application serving the user interface and the public API endpoints that SDKs call to ingest tracing data.^[15]
Langfuse Worker: A background processing service that asynchronously handles ingested trace data, enriches events, and writes to long-term storage.^[15]
PostgreSQL: A relational database that stores transactional data such as user accounts, organizations, API keys, prompt definitions, dataset metadata, and evaluation configurations.^[15]
ClickHouse: An OLAP (online analytical processing) columnar database that stores all observability data: traces, observations, scores, and session information.^[17] ClickHouse is used because columnar storage enables analytical queries (aggregations, filtering, time-series analysis) to scan only the columns relevant to a query rather than reading entire rows, providing substantially faster query performance than row-oriented databases for analytical workloads.^[17]
Redis: A cache and queue layer used to decouple trace ingestion from processing. When the Web container receives a batch of trace events from an SDK, it writes the raw event payload to S3-compatible blob storage and enqueues only a reference in Redis, allowing immediate acknowledgement to the SDK client while processing continues asynchronously.^[15]
S3-compatible blob storage: Holds raw event payloads, attachments, and media files. MinIO is used in local development setups; AWS S3, Azure Blob, or GCP Cloud Storage can be used in production deployments.^[15]

Data ingestion flow

The ingestion path is designed to minimize latency impact on the instrumented application. Langfuse's Python and JavaScript SDKs queue trace events locally in memory and batch-flush them asynchronously to the Langfuse API.^[15] The SDK call returns immediately without waiting for the Langfuse API to process the event, so LLM application response latency is not affected by the observability instrumentation.

When the Web container receives a batch, it writes the raw events to S3 and queues a lightweight reference in Redis. The Worker container reads the queue, retrieves the full event payload from S3, processes and enriches the events, and writes the result to ClickHouse.^[15] This pipeline decouples the high-throughput ingestion path from the potentially slower analytical storage layer.

OpenTelemetry integration

Langfuse is built on OpenTelemetry standards and supports two complementary integration paths.^[1] The native SDKs for Python and JavaScript are themselves built on OpenTelemetry primitives under the hood, providing structured tracing semantics without requiring the user to understand the OpenTelemetry specification directly.

For teams already using OpenTelemetry instrumentation in their infrastructure, Langfuse exposes a native OpenTelemetry ingestion endpoint at /api/public/otel. Any application instrumented with OpenTelemetry-compatible libraries can send traces directly to this endpoint without installing the Langfuse SDK. This is particularly valuable for polyglot environments or teams that have standardized on OpenTelemetry across their observability stack and want LLM traces to flow into the same pipeline alongside application performance monitoring data.

The OpenTelemetry-first architecture reduces vendor lock-in: teams that switch from Langfuse to another observability platform can continue using their existing instrumentation and redirect the OTEL endpoint without rewriting application code.

Deployment architecture

For self-hosted deployments Langfuse requires all of its infrastructure components to be operational.^[16] Docker Compose configurations are provided for local development and low-scale deployments. Production deployments at scale are supported through official Helm charts for Kubernetes, and through Terraform modules for AWS, Azure, and GCP that provision all required infrastructure components within a VPC.^[16] Railway platform support is also documented for teams preferring a managed platform abstraction.^[16]

All infrastructure components must run with their timezone configured as UTC to ensure consistent timestamp handling across the tracing pipeline.^[17]

Langfuse Cloud runs on AWS in multiple regions with Cloudflare providing WAF (web application firewall) and central request proxying.^[16] CI/CD pipelines use Terraform for automated infrastructure management.

Features

What is Langfuse used for?

Langfuse is used to trace, evaluate, and manage LLM applications in production. Its capabilities span four areas: observability (tracing requests through an LLM application), prompt management (versioning and deploying prompts), evaluation (scoring output quality), and analytics (monitoring cost, latency, and quality over time). The sections below describe each.

Tracing and observability

Langfuse's tracing system captures the full execution context of LLM application requests. The core data model consists of three entity types:

Traces are top-level containers representing a single request flowing through an application. Each trace captures the complete context: the triggering input, the user identifier if provided, a session identifier for multi-turn conversations, environment tags (production, staging, development), custom metadata, and timing information.

Observations are nested operations within a trace. Every discrete step in an LLM application, such as an LLM call, a retrieval operation, an embedding generation, a tool invocation, a database query, or any custom code segment, can be captured as an observation. Observations can be nested arbitrarily to reflect the actual call hierarchy of the application, including complex agent workflows with multiple levels of tool calls and sub-agents.

Scores are evaluation metrics attached to traces or observations. A score can represent user feedback (a thumbs-down rating), an automated evaluation result (a numeric quality score from an LLM-as-judge evaluation), or a manual annotation from a human reviewer. Scores are queryable and aggregatable, enabling quality trend analysis over time.

For each LLM call observation, Langfuse captures the exact prompt sent to the model (including system prompt, user messages, and any context), the raw model response, token usage broken down by input and output tokens, the associated cost in US dollars (calculated using configurable pricing per model), and latency. This level of detail is what enables debugging: when a production LLM call produces an unexpected response, engineers can inspect the exact input the model received and the exact output it returned.

Sessions group related traces together for multi-turn conversational applications. A session ID links all the traces from a single conversation, allowing teams to analyze full conversation flows rather than individual requests in isolation.

Asynchronous SDK design ensures that instrumentation overhead does not affect production latency. All SDK calls queue events locally and flush them in background threads or processes, returning control to the application immediately.

Langfuse integrates with frameworks through multiple paths: native SDK wrappers for direct OpenAI SDK calls, automatic instrumentation for LangChain and LangGraph, LlamaIndex callback handlers, LiteLLM proxy support, and the raw OpenTelemetry endpoint for custom or unsupported frameworks. Framework integrations are maintained for more than fifty tools as of 2026.^[1]

Prompt management

Langfuse's prompt management system allows teams to version, deploy, and collaborate on prompts centrally, separate from application code.^[19] The system provides:

Version control for prompt text, model parameters (temperature, max tokens, top-p), and model selection. Every change creates a numbered version with an optional commit message. Full version history is retained and any version can be deployed at any time.^[19]

Deployment via labels allows teams to publish a specific prompt version to a named deployment target (such as "production" or "staging") without modifying application code.^[19] Applications fetch the currently labeled version at runtime using the Langfuse SDK. This means prompt iteration and deployment is decoupled from the software release cycle: product managers and domain experts can update prompts in the Langfuse UI and their changes take effect immediately in production without requiring an engineer to update code, rebuild a container, or trigger a deployment pipeline.

Client-side caching in the SDK allows applications to use the latest prompt version without adding a network request to every LLM call.^[19] The SDK caches the current prompt version locally and refreshes it in the background at a configurable interval.

Prompt linking with traces connects deployed prompt versions to the traces they generate.^[19] When an application fetches a prompt using the SDK and uses it in an LLM call, Langfuse automatically links that trace to the exact prompt version that was used. This allows teams to analyze how different prompt versions perform in production on real traffic, not just on test datasets.

The Playground is an interactive prompt testing environment built into the Langfuse UI. Teams can test prompt variants against different models and compare outputs side by side. Traces can be opened directly in the Playground to reproduce a production request, facilitating debugging workflows where an engineer identifies a bad trace and then immediately iterates on the prompt to fix the issue.

Prompts can be organized in folders and searched by full-text content.^[19] Non-technical stakeholders can make updates directly in the UI while engineers maintain the integration through the SDK.

Evaluation

Evaluation in Langfuse supports multiple scoring approaches that can be combined:

LLM-as-a-judge evaluators use a language model to automatically score traces or observations against configurable quality criteria. A team defines a rubric, for example whether a response is factually accurate, whether it follows the company's tone of voice, or whether it correctly cites its sources, and Langfuse runs every new trace (or a sampled subset) through the evaluator and records the score. Multiple LLM-as-judge evaluators with different rubrics can run in parallel.

User feedback collection captures explicit feedback signals from application users. Buttons, ratings, or reactions in the application UI can be linked to the corresponding Langfuse traces via the SDK, attaching the feedback as a score. This creates a direct line from user satisfaction signals to the specific prompts and model calls that generated the experience.

Manual annotation allows human reviewers to read traces and apply scores according to a configured rubric. Annotation queues in the Langfuse UI route traces to reviewers, who can view the full trace context and apply multi-dimensional scores. Manual annotation is particularly important for domains where automated scoring is unreliable or where the evaluation requires expert judgment.

Custom evaluation pipelines allow teams to implement arbitrary scoring logic outside of Langfuse and push scores back via the API. This is useful when a team already has domain-specific evaluation code, A/B testing infrastructure, or external quality measurement systems.

Datasets and experiments

Langfuse's Datasets feature provides structured test sets for systematic evaluation of LLM application quality.^[20] A dataset is a collection of test items, each consisting of an input (the prompt or request) and optionally an expected output or evaluation criteria.^[20]

Datasets can be created by manually curating representative inputs, by uploading CSV files, or by adding traces from production directly to a dataset.^[20] The ability to seed datasets from production traces is particularly valuable: it captures edge cases and failure modes that developers would not have anticipated when writing synthetic test cases.

Experiments allow teams to run a prompt variant or model configuration against a dataset and compare the results against a baseline.^[20] The Experiments UI shows side-by-side output comparisons for each dataset item, with scores from configured evaluators displayed alongside the raw outputs. Teams can run experiments entirely through the Langfuse UI without writing custom code, or orchestrate programmatic experiments through the SDK for more complex evaluation workflows.

Dataset items support versioning and can be organized in folders. Experiments track which prompt version and model configuration was used for each run, creating a reproducible record of how quality has changed across iterations.

Dashboard and analytics

Langfuse provides an analytics dashboard aggregating quality, cost, and latency metrics across the tracing data. Teams can monitor:

Token usage and associated cost over time, broken down by model, user, session, or custom tags
Latency percentiles for LLM calls and end-to-end trace durations
Error rates and failure patterns
Score distributions from automated and manual evaluations
Model usage mix across an application or across an organization

The dashboard is queryable with custom filters, allowing teams to segment metrics by any combination of trace metadata. Time-series views enable identification of regressions or cost spikes correlated with prompt changes or model provider updates.

Should you self-host Langfuse or use the cloud?

Langfuse offers two primary deployment models that run identical codebases and feature sets.^[16] Teams that want the lowest operational overhead, or that are evaluating the platform, typically start with Langfuse Cloud; teams with strict data residency, privacy, or compliance requirements typically self-host.

Langfuse Cloud

Langfuse Cloud is a fully managed multi-tenant service operated by the Langfuse team. It handles all infrastructure operations including provisioning, scaling, backup, and security patching. Users create an account, generate API keys, and begin sending traces within minutes. Langfuse Cloud runs on AWS infrastructure across multiple regions and holds SOC 2 Type II and ISO 27001 certifications.^[24] HIPAA alignment is available on higher pricing tiers.^[18]

Cloud is the lowest-friction option for teams starting with Langfuse or for teams that prefer to avoid infrastructure management. Because Langfuse Cloud processes all trace data on Langfuse-operated infrastructure, teams with strict data residency or data privacy requirements may prefer self-hosting instead.

Self-hosted

Self-hosting runs the full Langfuse stack on infrastructure the organization controls. All required components (Langfuse Web, Langfuse Worker, PostgreSQL, ClickHouse, Redis, and S3-compatible storage) must be operational.^[16] This is more complex to set up and operate than the cloud option, but gives organizations complete control over their data and infrastructure.

Self-hosted deployments use the same codebase as Langfuse Cloud, so there is no feature gap between the two deployment models for MIT-licensed capabilities.^[16] Enterprise edition security features (SCIM, audit logging, data retention policies) require a commercial license key when self-hosting.^[24]

Docker Compose provides the simplest self-hosted starting point, suitable for development and low-traffic scenarios.^[16] For production scale, official Kubernetes Helm charts are maintained and recommended.^[16] Terraform modules for AWS, Azure, and GCP allow infrastructure-as-code provisioning of the complete stack within a VPC.^[16]

Scaling self-hosted deployments involves running multiple replicas of the Web and Worker containers behind a load balancer, and using managed database services for ClickHouse and PostgreSQL rather than single-node containers.^[16] Langfuse's documentation describes horizontal scaling patterns and configuration options for high-availability deployments.^[16]

How much does Langfuse cost?

Langfuse Cloud pricing is organized around "units," where one billable unit corresponds to one tracing data point sent to the platform (a trace, an observation, or a score).^[18] Self-hosted deployments using the MIT-licensed open-source version are free with no usage limits.^[1]

Plan	Monthly Price	Included Units	Data Retention	Users	Notable Inclusions
Hobby	Free	50,000	30 days	2	Community support
Core	$29	100,000	90 days	Unlimited	In-app support; startup/research discounts available
Pro	$199	100,000	3 years	Unlimited	SOC2/ISO27001 reports; HIPAA available; high rate limits
Enterprise	$2,499	Custom	Custom	Unlimited	Dedicated support engineer; custom rate limits; uptime SLA

Additional units beyond the plan inclusion are billed at graduated rates: $8 per 100,000 units for the first million additional units, $7 per 100,000 from 1 million to 10 million, $6.50 per 100,000 from 10 million to 50 million, and $6 per 100,000 above 50 million.^[18]

The Pro plan includes an optional Teams Add-on at $300 per month that adds enterprise single sign-on (SSO) and fine-grained role-based access control (RBAC).^[18]

The Enterprise plan at $2,499 per month includes everything in the Pro plan plus the Teams features, a dedicated support engineer, custom rate limits, and an uptime service-level agreement.^[18] Annual commitments at the Enterprise tier unlock custom volume pricing and AWS Marketplace billing.^[18]

The enterprise self-hosted license, which adds security features, is available at pricing negotiated directly with Langfuse.^[24] Startups, academic researchers, and open-source projects are eligible for discounts on paid cloud tiers.^[18]

How does Langfuse differ from LangSmith and other competitors?

The LLM observability and evaluation space includes several actively developed platforms. The key alternatives to Langfuse are LangSmith, Helicone, and Arize Phoenix.^[22] The headline distinction is licensing and deployment: Langfuse and Arize Phoenix are open source (MIT) and self-hostable, whereas LangSmith is a closed-source commercial product without a fully self-hostable open-source edition.^[22]^[23]

Feature	Langfuse	LangSmith	Helicone	Arize Phoenix
Open source	Yes (MIT core)	No	No	Yes (MIT)
Self-hosted	Yes	Limited	Optional	Yes
Tracing / observability	Yes	Yes	Yes	Yes
Prompt management	Yes	Yes	Limited	Limited
LLM-as-judge evaluation	Yes	Yes	No	Yes
Datasets and experiments	Yes	Yes	No	Yes
Manual annotation queues	Yes	Yes	No	No
Session tracking	Yes	Yes	Limited	Limited
Cost tracking	Yes	Yes	Yes	Yes
OpenTelemetry native	Yes	Yes	No	Yes
Framework-agnostic	Yes	Primarily LangChain	Yes	Yes
LangChain deep integration	Yes	Yes (native)	No	No
Proxy-based ingestion	No	No	Yes	No
Free cloud tier (monthly)	50,000 units	5,000 traces	10,000 requests	Free (self-hosted)
Paid cloud starting price	$29/month	$39/user/month	Varies	Cloud tier available
SOC 2 Type II	Yes	Yes	Unknown	Yes
Fortune 500 adoption	Yes	Yes	Unknown	Yes

LangSmith

LangSmith is the observability and evaluation platform built by LangChain as the companion tooling layer for applications using the LangChain and LangGraph frameworks.^[23] Its primary strength is zero-configuration tracing for LangChain users: applications built on LangChain automatically generate rich traces in LangSmith without additional instrumentation code.^[23]

Beyond LangChain applications, LangSmith's value diminishes for teams using other orchestration frameworks or calling provider APIs directly. Its pricing model is per-trace rather than per-unit, which becomes expensive at high volume.^[23] LangSmith is a closed-source commercial product with no self-hosted option for the full platform, which creates data residency concerns for organizations in regulated industries.^[22] Teams fully committed to the LangChain ecosystem and willing to accept cloud-only deployment will find LangSmith the lowest-friction path; teams that use multiple frameworks or need self-hosting will find Langfuse more suitable.

Helicone

Helicone is an LLM observability tool distinguished by its proxy-based integration model.^[22] Rather than installing an SDK and instrumenting application code, teams using Helicone redirect their OpenAI API base URL to Helicone's proxy endpoint. This makes initial setup extremely fast, the company claims a 15-minute integration, and requires no code changes beyond a one-line configuration update.

The proxy architecture has tradeoffs. All LLM traffic passes through Helicone's infrastructure, adding 50 to 80 milliseconds of latency to every LLM call.^[22] The proxy approach also captures less granular data than SDK-based instrumentation: it can record API request and response payloads but cannot instrument the broader application logic surrounding the LLM call. Helicone lacks the evaluation, dataset, and prompt management capabilities that Langfuse provides.^[22]

As of early 2026, Helicone was reported to be in maintenance mode with no new features planned, reflecting the platform's challenge in competing with more full-featured alternatives.^[23] Helicone processed more than 2 billion LLM interactions, demonstrating enterprise-scale usage, but teams requiring evaluation workflows, prompt versioning, or active product development were advised to consider alternatives.^[23]

Arize Phoenix

Arize Phoenix is the open-source observability and evaluation platform from Arize AI, which also operates a commercial enterprise ML and AI monitoring product (Arize AX).^[22] Phoenix is fully open source under the MIT license with no enterprise feature gating, and is designed to run locally or on self-hosted infrastructure with no cloud dependency.^[22]

Phoenix is built natively on OpenTelemetry and provides strong evaluation primitives including drift detection and embeddings analysis.^[22] Its primary limitations are infrastructure complexity and the learning curve associated with OpenTelemetry-native setup, which independent reviewers estimated at two to four weeks for a production deployment.^[23] Phoenix's focus on ML-grade monitoring rigor makes it particularly suited for teams with dedicated platform engineering resources.

Arize AI raised $70 million in a Series B round in February 2025, providing strong financial backing for continued Phoenix development.^[23] Langfuse and Phoenix compete directly on the open-source self-hosted segment; the primary differentiators are Langfuse's more integrated prompt management and team collaboration features versus Phoenix's stronger ML-oriented evaluation primitives and embeddings analysis.

Who uses Langfuse?

Langfuse's customer base spans enterprises across industries, from technology companies building AI-native products to large corporations deploying LLM applications in regulated environments. The platform reported adoption among 19 of the Fortune 50 and 63 of the Fortune 500 as of January 2026, with more than 2,000 paying customers.^[2]^[7]

Some of the most prominent open-source projects integrating Langfuse include Langflow, which had accumulated more than 116,000 GitHub stars by early 2026, and Open WebUI, with more than 109,000 GitHub stars. These integrations mean Langfuse's observability layer is embedded in applications used by millions of end users across the open-source AI ecosystem.

Named enterprise users include the pharmaceutical company Merck and the education nonprofit Khan Academy. Walid Mehanna, Chief Data and AI Officer at Merck, described the platform's value in addressing enterprise AI's trust problem: "Generative AI will only earn enterprise trust when we can see what's happening under the hood. Langfuse enables us to track every prompt, response, cost, and latency in real time, turning black-box models into auditable, optimizable assets."^[7] Walt Wells, a Staff Software Engineer at Khan Academy, said that "Langfuse has really enabled our developers to get extremely fast feedback. When building and deploying features, we can quickly watch how those experiences are going."^[7]

Common production use cases across the customer base include:

Customer service and support automation: LLM-powered chatbots and virtual assistants handling customer inquiries. Teams use Langfuse to track resolution quality, identify failure modes where the model hallucinated or gave incorrect answers, and measure the impact of prompt changes on user satisfaction scores.

Document parsing and data extraction: Applications that extract structured data from unstructured documents such as contracts, medical records, invoices, or research papers. Langfuse's evaluation framework enables teams to measure extraction accuracy against ground truth datasets and detect regressions when model providers update their models.

Retrieval-augmented generation (RAG) applications: Systems that combine LLM generation with retrieval from knowledge bases or document stores. Langfuse traces can capture both the retrieval step and the generation step, allowing teams to identify cases where the retrieval returned irrelevant context and correlate those with poor generation quality.

Agentic workflows: Multi-step AI agents where an LLM orchestrates tool calls, web searches, code execution, or other external actions. Langfuse visualizes agent traces as graphs showing the full decision tree and tool call sequence, enabling debugging of complex behaviors that are opaque without structured tracing.

Regulated industry deployments: Healthcare, financial services, and government organizations using LLMs for clinical documentation, compliance checking, or internal knowledge management. These organizations frequently choose self-hosted Langfuse deployments to maintain data residency and satisfy audit requirements. The platform's SOC 2 Type II, ISO 27001, and HIPAA alignment certifications address procurement requirements in these sectors.^[24]

Internal tooling and productivity applications: Enterprise teams building internal LLM-powered tools for employees. Cost tracking in Langfuse is particularly valuable here, as organizations want to understand per-user or per-department LLM spend and optimize model selection accordingly.

Limitations

Langfuse has several limitations that prospective users should consider.

Self-hosted infrastructure complexity is the most commonly cited limitation. A production self-hosted deployment requires several separate infrastructure components: PostgreSQL, ClickHouse, Redis, S3-compatible object storage, and two application containers.^[16] Teams without dedicated infrastructure engineering resources may find the operational burden significant compared to the cloud option or to simpler alternatives like Helicone's proxy-based approach.

ClickHouse as a dependency adds complexity specifically for teams unfamiliar with OLAP databases. ClickHouse has different operational characteristics from PostgreSQL, requires UTC timezone configuration, and has its own performance tuning considerations for high-volume ingestion.^[17] Teams with small DevOps teams should evaluate whether they have the expertise to operate ClickHouse reliably before committing to a self-hosted Langfuse deployment.

Workflow orchestration is out of scope. Unlike Vellum or similar all-in-one platforms, Langfuse is an observability and evaluation platform, not a workflow orchestration layer. Teams that need a visual canvas for building agent workflows, managed RAG infrastructure, or LLM routing and fallback logic will need separate tools for those capabilities. Langfuse integrates with orchestration frameworks like LangChain, LiteLLM, and LlamaIndex rather than replacing them.

Cost at scale on Cloud can be significant for high-volume applications. Teams ingesting tens of millions of trace events per month will pay several hundred to several thousand dollars per month on Langfuse Cloud.^[18] The self-hosted open-source option removes this cost but reintroduces infrastructure complexity. Teams at very high scale should model both the cloud cost and the infrastructure cost of self-hosting before selecting a deployment path.

Evaluation depth for advanced use cases. Langfuse's evaluation capabilities are well-suited for standard LLM application quality monitoring, but teams requiring highly specialized evaluation patterns, such as embedding drift detection, ML-style model bias analysis, or complex statistical evaluation frameworks, may find Phoenix or dedicated model evaluation tools provide more depth.^[23] Langfuse's evaluation is tightly integrated with its observability stack, which is a strength for teams wanting a unified workflow but a limitation for teams with sophisticated custom evaluation infrastructure.

No native UI for workflow building. Teams that want to build LLM application logic in a visual canvas environment will not find that capability in Langfuse. The platform assumes the application logic exists and instruments it, rather than providing a canvas for constructing that logic. This is by design but means Langfuse occupies a different category from full-stack LLMOps platforms.

Enterprise features require commercial licensing for self-hosted deployments. Features like SCIM, audit logging, and data retention policies are available only under a commercial license when self-hosted.^[24] Organizations that need these security features and prefer self-hosting must negotiate an enterprise license, which introduces a procurement process that may slow down initial deployment.

References

"Langfuse: Open source LLM Engineering Platform." GitHub repository. https://github.com/langfuse/langfuse ↩
"About Langfuse." Langfuse. https://langfuse.com/about ↩
"How did we get here?" Langfuse Handbook. https://langfuse.com/handbook/chapters/story ↩
"Langfuse raises $4M." Langfuse Blog, November 2023. https://langfuse.com/blog/announcing-our-seed-round ↩
"Building with Langfuse: Observability and Analytics for LLM Applications." Lightspeed Venture Partners. https://lsvp.com/stories/building-with-langfuse-observability-analytics-for-llm-applications/ ↩
"Langfuse joins ClickHouse." Langfuse Blog, January 16, 2026. https://langfuse.com/blog/joining-clickhouse ↩
"ClickHouse welcomes Langfuse: The future of open-source LLM observability." ClickHouse Blog, January 16, 2026. https://clickhouse.com/blog/clickhouse-acquires-langfuse-open-source-llm-observability ↩
"ClickHouse raises $400M Series D led by Dragoneer to accelerate expansion across analytics and AI infrastructure." ClickHouse Blog, January 16, 2026. https://clickhouse.com/blog/clickhouse-raises-400-million-series-d-acquires-langfuse-launches-postgres ↩
"Open-source LLM Observability: Langfuse Acquired by ClickHouse, Inc." Orrick, January 2026. https://www.orrick.com/en/News/2026/01/Open-source-LLM-Observability-Langfuse-Acquired-by-ClickHouse-Inc ↩
"Database maker ClickHouse raises $400M, acquires AI observability startup Langfuse." SiliconANGLE, January 16, 2026. https://siliconangle.com/2026/01/16/database-maker-clickhouse-raises-400m-acquires-ai-observability-startup-langfuse/ ↩
"ClickHouse buys Langfuse as data platforms race to own the AI feedback loop." InfoWorld, January 2026. https://www.infoworld.com/article/4118621/clickhouse-buys-langfuse-as-data-platforms-race-to-own-the-ai-feedback-loop.html ↩
"ClickHouse acquires Langfuse." Hacker News discussion, January 2026. https://news.ycombinator.com/item?id=46656552 ↩
"Langfuse: Open source LLM engineering platform." Y Combinator company profile. https://www.ycombinator.com/companies/langfuse ↩
"How Langfuse pivoted and raised $4M after leaving Y Combinator." PostHog Startup Spotlight. https://posthog.com/spotlight/startup-langfuse ↩
"Langfuse Architecture." Langfuse Handbook. https://langfuse.com/handbook/product-engineering/architecture ↩
"Self-host Langfuse." Langfuse documentation. https://langfuse.com/self-hosting ↩
"ClickHouse (self-hosted): Langfuse." Langfuse documentation. https://langfuse.com/self-hosting/deployment/infrastructure/clickhouse ↩
"Langfuse Pricing." Langfuse. https://langfuse.com/pricing ↩
"Open Source Prompt Management." Langfuse documentation. https://langfuse.com/docs/prompt-management/overview ↩
"Datasets." Langfuse documentation. https://langfuse.com/docs/evaluation/experiments/datasets ↩
"Langfuse and ClickHouse: A new data stack for modern LLM applications." ClickHouse Blog. https://clickhouse.com/blog/langfuse-and-clickhouse-a-new-data-stack-for-modern-llm-applications ↩
"8 AI Observability Platforms Compared: Phoenix, LangSmith, Helicone, Langfuse, and More." Softcery. https://softcery.com/lab/top-8-observability-platforms-for-ai-agents-in-2025 ↩
"Agent Observability: LangSmith, Langfuse, Arize 2026." Digital Applied. https://www.digitalapplied.com/blog/agent-observability-platforms-langsmith-langfuse-arize-2026 ↩
"Langfuse for Enterprise." Langfuse. https://langfuse.com/enterprise ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributors · full history

Suggest edit

What links here

Agent evaluation Agent orchestration Arize Phoenix Companies Dev tools Helicone LangSmith LiteLLM Minimum Viable Agent

History

Who founded Langfuse, and when?

Y Combinator W23 and early development

When did Langfuse raise its seed round?

Growth and product milestones (2024-2025)

Why did ClickHouse acquire Langfuse?

Is Langfuse open source?

Architecture

System components

Data ingestion flow

OpenTelemetry integration

Deployment architecture

Features

What is Langfuse used for?

Tracing and observability

Prompt management

Evaluation

Datasets and experiments

Dashboard and analytics

Should you self-host Langfuse or use the cloud?

Langfuse Cloud

Self-hosted

How much does Langfuse cost?

How does Langfuse differ from LangSmith and other competitors?

LangSmith

Helicone

Arize Phoenix

Who uses Langfuse?

Limitations

See also

References

Improve this article

Related Articles

Helicone

Arize Phoenix

Patronus AI

LangSmith

MLflow

BentoML

What links here

Related Articles

Helicone

Arize Phoenix

Patronus AI

LangSmith

MLflow

BentoML

What links here