LangSmith
Last reviewed
May 7, 2026
Sources
24 citations
Review status
Source-backed
Revision
v1 ยท 5,289 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 7, 2026
Sources
24 citations
Review status
Source-backed
Revision
v1 ยท 5,289 words
Add missing citations, update stale details, or suggest a clearer explanation.
LangSmith is a commercial observability, evaluation, and deployment platform for large language model (LLM) applications and AI agents, developed and operated by LangChain Inc. It provides developers and operations teams with tooling to trace, debug, test, evaluate, and monitor the behavior of LLM-powered systems throughout their lifecycle, from early prototyping through production. LangSmith is available as a hosted cloud service at smith.langchain.com and as a self-hosted enterprise deployment.
Launched in closed beta in August 2023 and reaching general availability in February 2024, LangSmith has become one of the most widely adopted platforms in the LLM observability category. By October 2025, when LangChain Inc. raised a $125 million Series B at a $1.25 billion valuation, the platform had processed more than one billion traces and demonstrated 12x year-over-year growth in monthly trace volume. As of 2025, LangSmith is used by organizations including Replit, Clay, Harvey, Rippling, Cloudflare, Workday, Cisco, LinkedIn, Uber, Rakuten, J.P. Morgan, and BlackRock.
Although LangSmith integrates most naturally with LangChain and LangGraph, it was explicitly designed to function as a framework-agnostic platform, and as of March 2025 it supports end-to-end OpenTelemetry ingestion, enabling it to ingest trace data from any application that emits standard OpenTelemetry signals.
LangChain began in October 2022 as an open-source Python library created by Harrison Chase, then an engineer at Robust Intelligence, in the weeks following OpenAI's launch of ChatGPT. The library gave developers a structured way to chain LLM calls, connect to external tools and data sources, and build applications that required multiple reasoning steps. It grew rapidly, accumulating tens of thousands of GitHub stars within months and establishing itself as the dominant framework for LLM application development in 2023.
As LangChain applications moved from demos into production, teams encountered a problem that had no adequate solution: LLM systems were opaque. Traditional software monitoring tools reported latency, error rates, and resource utilization, but they could not answer the questions that mattered most for LLM applications: Why did the model give a wrong answer? What context was sent to the model? Which retrieval step returned irrelevant documents? How does changing a prompt affect output quality across hundreds of examples?
Standard APM (application performance monitoring) tools such as Datadog or New Relic could observe the infrastructure around an LLM call but had no visibility into the semantic content of the call itself. Developers resorted to ad hoc logging, printing prompts and responses to console output, or writing custom evaluation scripts for each project.
LangChain Inc. formed as a company around the open-source library in early 2023, with Harrison Chase as chief executive officer. The team concluded that the observability gap was blocking adoption at scale and began designing LangSmith as a dedicated platform to address it.
LangChain Inc. is headquartered in San Francisco. Beyond the LangSmith platform, the company maintains LangChain (the open-source orchestration framework), LangGraph (a stateful agent orchestration library), and LangGraph Platform (a hosted deployment environment for LangGraph applications). The company's business model centers on LangSmith subscriptions and LangGraph Platform compute, with the open-source frameworks serving as the primary developer acquisition channel.
LangChain and LangGraph together reached 90 million combined monthly downloads as of late 2025. Approximately 35% of Fortune 500 companies were using LangChain products in some capacity by that time.
LangChain Inc. announced LangSmith on August 18, 2023, launching it in closed beta. The announcement framed LangSmith as "a unified platform for debugging, testing, evaluating, and monitoring your LLM applications." At the time, developers working with LangChain had no systematic way to inspect the intermediate steps of a chain or agent, compare the effect of prompt changes across a test set, or annotate model outputs for quality assessment.
The closed beta required invitation-based signup. LangChain Inc. positioned LangSmith as separate from the LangChain framework from the outset, committing to support any LLM framework rather than locking the tool to LangChain-native workflows. This design decision was intentional: the team recognized that useful observability infrastructure needs to function regardless of how the underlying application was built.
On February 15, 2024, LangChain Inc. announced the general availability of LangSmith and simultaneously announced a $25 million Series A funding round led by Sequoia Capital, with Sonya Huang representing Sequoia on the deal. At general availability, the platform had accumulated more than 80,000 signups and was serving more than 5,000 monthly active teams. In January 2024 alone, users had logged 40 million traces through the platform.
The GA announcement introduced a formal pricing structure with a free Developer tier, a paid Plus tier aimed at teams, and an Enterprise tier for large organizations. Early named customers included Rakuten, Elastic, Moody's, and Retool.
The Series A was valued at approximately $200 million post-money, establishing LangChain Inc. as a significant venture-backed player in the AI developer tools market.
Through 2024 and into 2025, LangSmith expanded beyond core tracing and evaluation to encompass prompt management, online evaluation of production traffic, annotation queue workflows for human review, and deployment capabilities for LangGraph agents. The platform began incorporating an AI assistant called Polly for trace analysis and an Insights Agent that automatically categorizes agent behavior patterns from production data.
In July 2025, LangSmith became available on AWS Marketplace, enabling enterprise customers to procure it through existing AWS accounts and committed cloud spend agreements.
In October 2025, LangChain Inc. announced a $125 million Series B funding round at a $1.25 billion valuation, achieving unicorn status. The round was led by IVP, with participation from existing investors Sequoia, Benchmark, and Amplify, and new investors including CapitalG, Sapphire Ventures, ServiceNow Ventures, Workday Ventures, Cisco Investments, Datadog Ventures, Databricks Ventures, and Frontline.
The company disclosed that LangSmith had logged more than one billion cumulative traces and had grown monthly trace volume by 12x year-over-year. LangChain Inc. disclosed annual recurring revenue (ARR) in the range of $12-16 million and reported serving approximately 1,000 paying customers.
The Series B announcement coincided with the release of LangChain 1.0, built on the LangGraph runtime, and the private preview launch of Agent Builder, a no-code tool for building agents through natural language descriptions.
LangSmith organizes observability data in a four-level hierarchy: organizations, workspaces, projects, and traces.
An organization is the top-level billing and administrative unit. Within an organization, workspaces provide logical isolation between environments (for example, development, staging, and production). The Plus tier allows up to three workspaces per organization; Enterprise accounts can configure custom workspace counts.
A project is a named container that groups all traces for a single application or service. Developers typically create one project per application or deployment environment.
A trace represents a single end-to-end operation: one user request processed through a chain or agent. Each trace is a tree of runs. A run is the atomic unit of observation in LangSmith, equivalent to a span in OpenTelemetry terminology. Runs can represent any discrete step: an LLM call, a retrieval operation, a tool invocation, a document formatting step, or a custom function. The root run of a trace represents the top-level entry point, and child runs represent nested operations invoked during execution. A single trace can contain up to 25,000 runs.
Multiple traces from a multi-turn conversation can be linked into a thread using a shared session_id, thread_id, or conversation_id identifier, enabling analysis of multi-turn interactions as a coherent unit.
The LangSmith backend uses three storage systems optimized for different workload types. ClickHouse stores high-volume trace and feedback data, optimized for the analytical queries used in dashboards and evaluation reports. PostgreSQL handles transactional and operational data including user accounts, project configurations, and access control records. Redis provides in-memory caching and queue management for low-latency operations.
LangSmith provides official SDKs for Python and TypeScript. The Python SDK is distributed as the langsmith package on PyPI. Instrumentation can be applied at three levels of invasiveness:
The lowest-friction path for LangChain and LangGraph applications requires only two environment variables: LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY. All LangChain and LangGraph operations are automatically traced through the LangChain callback system with no code changes.
For non-LangChain Python code, the @traceable decorator wraps any function to create a run automatically, capturing function inputs and outputs as run inputs and outputs.
For the most granular control, developers can use the RunTree API to construct trace hierarchies programmatically, explicitly creating parent and child runs and managing their relationships.
The SDK attaches metadata to traces, including tags (string labels for filtering), key-value metadata fields (for environment information, versions, or user identifiers), and feedback scores (numeric or categorical ratings attached to individual runs).
In March 2025, LangChain Inc. announced end-to-end OpenTelemetry (OTel) support for LangSmith. The integration works in both directions: LangSmith can export trace data to external OpenTelemetry-compatible backends such as Datadog, Grafana, and Jaeger, and it can ingest OpenTelemetry traces from any application that emits standard OTel signals.
For export, the LangSmith SDK converts LangSmith trace data into OpenTelemetry format and transmits it through the OpenTelemetry SDK. For ingest, any application instrumented with OpenTelemetry can send spans to LangSmith's OTel endpoint, which maps OTel span attributes to LangSmith's data model and renders them in the platform's LLM-specific visualization.
OpenTelemetry mode is enabled by setting the environment variables LANGSMITH_OTEL_ENABLED=true and LANGSMITH_TRACING=true and installing the langsmith[otel] extra. The OTel path carries slightly higher overhead than LangSmith's native binary tracing format; for workloads where LangSmith is the exclusive observability destination, the native format remains recommended for optimal performance.
The OTel support makes LangSmith interoperable with standard enterprise observability stacks and eliminates the requirement to choose between LangSmith and an existing OpenTelemetry pipeline.
The LangSmith web interface renders traces as interactive trees. Each node in the tree corresponds to a run, and users can expand any node to inspect its inputs, outputs, latency, token counts, and associated metadata. For LLM runs, the interface displays the full prompt sent to the model, the model's response, model name, temperature, token usage, and cost estimate.
The platform supports filtering traces by project, time range, metadata values, tags, feedback scores, error status, latency thresholds, and token counts. Saved filters can be applied as monitoring rules that trigger alerts or automated actions when matching traces appear.
LangSmith includes Polly, an AI assistant embedded in the trace interface that analyzes complex multi-step traces and explains what happened in natural language. Users can ask Polly questions about a specific trace, such as why a retrieval step returned irrelevant documents or why an agent took an unexpected tool call sequence.
The Insights Agent operates at the project level rather than the individual trace level. It analyzes patterns across many traces and generates summaries of failure modes, common input categories, output quality trends, and usage patterns. Teams use the Insights Agent to identify systemic issues that would be difficult to detect through manual trace inspection.
LangSmith provides monitoring dashboards that aggregate trace data into time-series metrics: request volume, error rate, median and p95 latency, token consumption, and estimated cost. Dashboards can be filtered by project, tag, or metadata to isolate metrics for specific user segments, deployment versions, or geographic regions.
Automation rules allow teams to define conditions, such as traces with latency above a threshold or feedback scores below a minimum, that trigger actions: routing the trace to an annotation queue for human review, adding the trace to a dataset, or sending an alert notification.
Beyond LangChain and LangGraph, LangSmith supports tracing for applications built with OpenAI's SDK, Anthropic's SDK, CrewAI, Vercel AI SDK, Pydantic AI, LlamaIndex, and any framework that emits OpenTelemetry spans. The Prompt Hub integration is specific to LangSmith-native prompts, but the tracing infrastructure is framework-agnostic.
LangSmith's evaluation framework distinguishes between offline evaluation, which runs before deployment using curated test datasets, and online evaluation, which monitors production traffic in real time. Both modes use the same evaluator types: human review, code-based evaluators, LLM-as-a-judge, and pairwise comparison.
A dataset in LangSmith is a versioned collection of input-output examples used as the benchmark for evaluation experiments. Datasets can be created by:
Datasets are versioned automatically: every edit or deletion creates a new version with a clean audit trail. Named version tags allow teams to mark versions that correspond to production releases or evaluation milestones. Dataset examples persist indefinitely, even after the underlying traces that generated them are deleted, making datasets the durable record of evaluation data.
Each example in a dataset contains an input field (what was sent to the application), optionally an expected output (the ground truth answer), and optional metadata fields. For structured outputs, datasets can store complex nested objects.
An experiment in LangSmith is a run of an application or function over a dataset, producing one output per example. Experiments are the primary unit of offline evaluation. The evaluate() function in the LangSmith SDK accepts an application function and a dataset name and handles execution, including configurable concurrency, repetitions for variability sampling, and result caching to avoid re-running identical inputs.
Each experiment produces a run tree, where the root run represents one dataset example and child runs represent the nested operations that produced the output. Evaluator scores are attached to runs as feedback, enabling granular debugging of which step in a multi-step application contributed to a score.
Experiment results are visualized as tables and charts in the platform UI. Teams can compare two or more experiments side by side to assess whether a prompt change, model upgrade, or architectural change improves or regresses performance across the dataset. The comparison view highlights examples where two experiments produce meaningfully different scores, surfacing the cases most worth investigating.
Code-based evaluators are Python functions that receive a run's input, output, and optionally the expected output from the dataset and return a numeric score or categorical label. They are deterministic, cheap to run, and appropriate for objective criteria such as exact match, format validation, JSON schema conformance, or regex matching.
LLM-as-a-judge evaluators send the run's input and output to a second LLM with an evaluation prompt asking it to score a quality dimension such as factual accuracy, tone, helpfulness, or goal completion. LangSmith provides built-in judge prompts for common criteria and a prompt editor for customizing evaluation rubrics. Because LLM judges are themselves probabilistic, LangSmith supports running each judge evaluation multiple times and averaging the result to reduce variance.
Pairwise evaluators present two outputs side by side to a judge (either a human or an LLM) and ask it to choose which is better. Pairwise evaluation is particularly useful when absolute quality scores are difficult to define but relative preference is clear, and it is commonly used for preference modeling and reinforcement learning from human feedback (RLHF) workflows.
Human evaluators can score any run directly in the LangSmith UI or through annotation queues.
Online evaluation applies evaluators to production traces as they are generated. Teams configure filters and sampling rates to control which fraction of production traffic is evaluated, managing the cost of running LLM-based judges at scale. Online evaluation results appear on monitoring dashboards alongside the raw production metrics, providing a continuous quality signal rather than a periodic benchmark.
Failed traces identified by online evaluation can be automatically added to datasets, creating a feedback loop in which production failures become regression test cases for the next development iteration.
LangSmith evaluation experiments can be integrated into continuous integration pipelines using the LangSmith SDK's assertion utilities. Teams define score thresholds and run evaluate() as part of a test suite; the suite fails if any evaluator score falls below the threshold. This enables automated regression testing for LLM quality, analogous to unit tests in traditional software engineering.
Annotation queues provide a structured interface for routing traces to human reviewers and collecting structured feedback at scale. They address the workflow problem of organizing human evaluation: without queues, reviewers must manually search for traces to review, there is no mechanism for assigning reviewers to specific tasks, and feedback collection is inconsistent.
LangSmith supports two annotation queue styles. Single-run queues present one trace at a time. The reviewer sees the full trace tree and is prompted to submit ratings on the rubric items configured for that queue. Rubric items can be numeric sliders (for continuous quality scores), categorical selections (for classification labels), or free-text fields (for corrections or comments).
Pairwise annotation queues (PAQs) present two traces side by side. The reviewer chooses which output is better, which is equivalent, or which is worse, optionally providing a text explanation. Pairwise queues are particularly effective for preference modeling tasks where relative quality is easier to judge than absolute quality, and they map directly onto the data format required for RLHF training.
Administrators configure queues with a name, description, and rubric definition. Traces can be added to queues manually by individual developers, automatically through automation rules triggered by monitoring conditions, or through bulk operations on filtered trace queries. Queues display reviewer progress, showing how many traces have been reviewed and how many remain.
Feedback collected through annotation queues is stored as structured run feedback and is available for analysis, export, and incorporation into datasets. The ability to route annotated traces directly to datasets closes the loop between human evaluation and automated testing.
The Prompt Hub is LangSmith's centralized repository for storing, versioning, and sharing prompt templates. Each prompt is identified by a name and owner, and it stores the template text along with associated model configuration defaults such as model name, temperature, and maximum tokens.
Every time a prompt is pushed to the Hub, LangSmith generates a unique commit hash representing that exact version. Teams can reference a prompt by name (always resolving to the latest commit), by a specific commit hash (for reproducibility), or by a named tag (for environment-based release management). Tags such as staging and production can be attached to specific commits and moved to newer commits as part of a deployment workflow, enabling teams to control which prompt version runs in each environment without changing application code.
LangSmith traces record which prompt commit was used for each execution, making it straightforward to correlate output quality changes with specific prompt modifications during debugging.
The Playground is an interactive testing environment integrated into LangSmith. Developers can load any prompt from the Hub, modify it, select a model provider and configuration, and run it against custom inputs or examples from a dataset. The Playground displays token counts and cost estimates for each run and allows side-by-side comparison of outputs from different prompts or models.
Prompt Canvas, a feature introduced in 2024, allows developers to request AI-assisted prompt rewrites with specific instructions (improve clarity, adjust reading level, change tone) and then preview a diff of the proposed changes before saving a new version to the Hub.
In 2025, LangSmith expanded its scope from pure observability to include deployment capabilities for LangGraph agents. The Deployment module enables human-in-the-loop workflows where agents pause and request human approval before executing sensitive or irreversible actions. It also supports background agent execution, where agents run asynchronously from user interactions, and multi-agent coordination patterns.
A centralized agent registry within LangSmith tracks all deployed agent versions with versioning, rollback capability, and support for A2A (Agent2Agent), MCP (Model Context Protocol), and Agent Protocol standards.
Fleet, a no-code agent system introduced in the same period, allows non-technical users to build agents through natural language descriptions rather than code. Fleet agents learn from feedback collected through annotation queues and can request permissions from administrators when encountering actions outside their configured scope.
The Agent Studio provides visual debugging with breakpoints, enabling developers to pause an agent mid-execution, inspect its state, modify variables, and resume, analogous to a traditional code debugger but for LLM agent workflows.
LangSmith offers three main pricing tiers as of 2025, with an additional startup program.
| Plan | Price | Seats | Included traces | Workspaces | Trace retention | Support |
|---|---|---|---|---|---|---|
| Developer | Free + pay-as-you-go | 1 | 5,000/month | 1 | 14 days | Community |
| Plus | $39/seat/month + pay-as-you-go | Unlimited | 10,000/month per org | Up to 3 | 14 days (base); 400 days (extended) | |
| Enterprise | Custom | Custom | Custom | Custom | Custom | SLA + dedicated engineers |
Overage traces on the Plus plan cost $2.50 per 1,000 traces at base retention (14 days) or $5.00 per 1,000 traces at extended retention (400 days). Developer plan overages are priced at $0.50 per 1,000 traces.
A notable pricing consideration: traces that receive feedback, annotations, or corrections are automatically upgraded to extended retention, incurring the higher $5.00 per 1,000 rate. For agent workflows with heavy annotation, this can substantially increase trace costs relative to a baseline estimate.
The Plus plan also includes deployment compute at $0.0007 per minute for development deployments and $0.0036 per minute for production deployments, plus Fleet agents at $0.005 per run beyond the included 500 monthly runs.
The Enterprise tier includes custom single sign-on (SSO) and SAML integration, System for Cross-domain Identity Management (SCIM) for automated user provisioning, full role-based and attribute-based access control (RBAC/ABAC), data encryption at rest and in transit, audit logs, team training, and architectural guidance. Enterprise customers can also negotiate hybrid deployment or self-hosted deployment options.
The Startup Program offers discounted rates and credits for venture-capital-backed early-stage companies building agentic applications.
LangSmith's deepest integration is with LangChain and LangGraph, which are maintained by the same company.
For LangChain applications, enabling tracing requires only setting the LANGCHAIN_TRACING_V2 and LANGCHAIN_API_KEY environment variables. Every chain, agent, LLM call, retrieval step, tool invocation, and output parser in a LangChain application generates a run automatically. No callback configuration or code modification is required.
For LangGraph applications, tracing integrates with LangGraph's state management model. Each graph node execution becomes a child run within the trace, capturing the node name, the graph state before and after execution, and any tool calls the node made. LangGraph's human-in-the-loop checkpoints appear in traces as pause events, giving developers full visibility into where an agent paused for human input and what state it resumed from.
LangSmith datasets drawn from LangGraph traces can include full agent conversation threads, preserving the multi-turn structure required to evaluate dialog quality and agent goal-completion over extended interactions.
Prompts stored in the LangSmith Prompt Hub are directly loadable in LangChain and LangGraph applications using the hub.pull() function, connecting the prompt management and tracing infrastructure within a single workflow.
The LLM observability market includes several competing platforms with different positioning and trade-offs.
| Feature | LangSmith | Langfuse | Helicone | Arize Phoenix |
|---|---|---|---|---|
| Open source | No | Yes (Apache 2.0) | Partial | Yes (Apache 2.0) |
| Self-hosting | Enterprise add-on | Free | No | Yes, free |
| Primary deployment model | Cloud SaaS | Cloud or self-hosted | Cloud SaaS (proxy) | Cloud or local |
| LangChain integration | Native, zero-config | SDK-based | SDK or proxy | SDK or OTel |
| OpenTelemetry support | Yes (March 2025) | Yes | Limited | Yes (primary) |
| Evaluation framework | Comprehensive | Growing | Basic | Comprehensive |
| Annotation queues | Yes | Yes | No | Limited |
| Prompt management | Yes (Hub + versioning) | Yes | No | Yes (April 2025) |
| Per-trace pricing | Yes | Yes | No (flat/seat) | Free (OSS) |
| Pairwise evaluation | Yes | No | No | Yes |
Langfuse is an open-source LLM observability platform released under the Apache 2.0 license. Its primary differentiator is self-hosting: the complete Langfuse feature set can be deployed on a team's own infrastructure at no software cost, with feature parity between the self-hosted and cloud versions. Langfuse is well-suited for teams with strict data residency requirements, budget constraints, or governance policies that prohibit sending application data to third-party SaaS platforms. Langfuse's evaluation and prompt management capabilities have expanded substantially since its initial release, making it competitive with LangSmith on most core features. The main trade-offs relative to LangSmith are that Langfuse requires infrastructure management when self-hosted and has a less mature pairwise evaluation workflow.
Helicone operates as an HTTP proxy rather than an SDK instrumentation layer. Applications route their LLM API calls through Helicone's proxy, which logs requests and responses without requiring application code changes. This architecture makes Helicone extremely fast to adopt, with typical setup times of under fifteen minutes, but it imposes limitations: as a proxy, Helicone sees only the HTTP boundary of each LLM call and has no visibility into the multi-step agent traces that span many LLM calls. It is best suited for applications where the primary observability need is tracking cost, latency, and volume across direct LLM API calls rather than debugging complex agent workflows. Helicone offers flat-rate pricing starting at $25 per month, which makes cost predictable for high-volume applications.
Arize Phoenix is an open-source observability platform built natively on OpenTelemetry, released under the Apache 2.0 license by Arize AI. Phoenix can run entirely locally or be deployed on team-managed infrastructure with no per-trace charges. It integrates with any framework that emits OpenTelemetry spans, making it the most framework-agnostic option in the category. Phoenix's evaluation framework supports complex multi-step agent evaluation and includes deep integration with the broader Arize AI ecosystem for production machine learning monitoring. In April 2025, Phoenix added a prompt management module with versioning and template reuse. The primary trade-off is operational: Phoenix requires infrastructure engineering to run at production scale (typically PostgreSQL and Kubernetes), and its UI is less polished than LangSmith's hosted interface. Arize AX, the enterprise SaaS product from Arize AI, is proprietary and separate from the open-source Phoenix project.
LangSmith is used across a broad range of industries and application types.
In software development tooling, companies such as Replit use LangSmith to trace and evaluate AI coding assistants, monitoring which retrieval contexts led to correct versus incorrect code suggestions and using annotation queues to collect developer feedback on code quality.
In financial services, organizations including J.P. Morgan and Moody's apply LangSmith's evaluation framework to validate that LLM outputs in regulatory and analytical contexts meet accuracy and compliance standards, maintaining audit trails of which prompt versions were active during specific time periods.
In cybersecurity, Elastic used LangGraph to orchestrate a multi-agent threat detection system and LangSmith to trace each step in the threat analysis workflow, enabling engineers to debug edge cases in detection logic and validate that agent behavior remained consistent after updates.
Rakuten built an enterprise-wide generative AI platform using LangGraph and LangSmith that enables employees across more than 70 business units to create and deploy AI agents. LangSmith provides the observability and evaluation layer that Rakuten's platform team uses to ensure quality across all deployed agents.
Customer-facing enterprise applications at companies such as Klarna, Vodafone, and Home Depot use LangSmith to monitor LLM-powered support and automation workflows, track quality metrics over time, and route edge cases to human review queues.
LangSmith has several limitations that prospective users should consider.
Vendor lock-in: LangSmith is closed-source software. Users cannot inspect or modify the underlying platform code, and the product roadmap is determined entirely by LangChain Inc. While the OpenTelemetry integration reduces instrumentation lock-in, the evaluation, annotation, and dataset management workflows are proprietary, and migrating accumulated datasets and experiment history to another platform requires export and reimport.
Self-hosting restrictions: Self-hosted deployment is only available as an Enterprise add-on and requires Kubernetes for production deployments or Docker for development environments. Teams that want data residency control without paying Enterprise rates must use a competing open-source platform such as Langfuse or Arize Phoenix.
Pricing at scale: Per-trace pricing can become expensive at high trace volumes. The automatic upgrade of annotated traces to extended retention at the $5.00 per 1,000 rate means that teams running active annotation workflows may see trace costs significantly higher than a naive estimate based on the base trace volume alone. Enterprise customers with predictable high-volume workloads typically negotiate flat-rate contracts to avoid this unpredictability.
Data governance: LangSmith's hosted SaaS model means that detailed application data, including full prompts, model responses, and user inputs, is transmitted to LangChain Inc.'s cloud infrastructure. For applications handling sensitive personal data, health information, or legally privileged content, this raises data governance considerations that may require the Enterprise self-hosted option or an alternative open-source platform.
Framework-agnostic experience gap: While LangSmith officially supports non-LangChain frameworks and provides OpenTelemetry ingest, the user experience is most fully realized for LangChain and LangGraph applications. Teams using other frameworks such as LlamaIndex, CrewAI, or custom orchestration code must invest more in integration configuration to achieve the same level of automatic instrumentation that LangChain applications receive out of the box.
Trace retention defaults: The default 14-day retention on base traces is short relative to the timescales of LLM application debugging and compliance requirements. Teams that need longer trace histories must pay for extended retention or proactively export traces to external storage.