Instructor (library)
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,361 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,361 words
Add missing citations, update stale details, or suggest a clearer explanation.
Instructor is an open-source Python library that returns type-safe, Pydantic-validated structured outputs from large language model APIs. It works by wrapping (originally "patching") a provider's client SDK so that any chat completion call accepts an extra response_model= argument whose value is a pydantic.BaseModel subclass, and the call returns a fully validated instance of that model instead of a raw string. The library was created in June 2023 by Jason Liu and has since grown into the most widely used open-source structured-output utility for LLMs, with roughly 3 million monthly downloads on PyPI and more than 13,000 GitHub stars by mid-2026.[^1][^2][^3] Instructor is distributed under the MIT License and is part of a small family of ports that share the same API ideas in TypeScript, Ruby, Go, Elixir, and Rust.[^1][^4]
| Field | Value |
|---|---|
| Original author | Jason Liu |
| Initial release | June 2023[^5] |
| Latest stable version (as of writing) | 1.15.1 (3 April 2026)[^3] |
| Language | Python (>=3.9, <4.0) |
| License | MIT[^3] |
| Repository | github.com/567-labs/instructor (formerly jxnl/instructor) |
| Core dependency | Pydantic v2 |
| Primary entry points | instructor.from_openai, instructor.from_anthropic, instructor.from_provider, instructor.patch |
| Sister projects | instructor-js (TypeScript), and ports in Ruby, Go, Elixir, Rust[^4] |
| Supported providers (selection) | OpenAI, Azure OpenAI, Anthropic, Google Gemini, Google Vertex AI, Amazon Bedrock, Mistral AI, Cohere, Groq, Cerebras, Ollama, llama-cpp-python, Together AI, Fireworks AI, DeepSeek, xAI, Writer, Perplexity, SambaNova, OpenRouter, LiteLLM[^6] |
Before tool-use APIs were widespread, applications that needed structured data from an LLM commonly executed code resembling response.choices<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>.message.content, then ran an ad-hoc regular expression or a JSON parser on the resulting string. This approach fails frequently in production: the model may add prose around the JSON, omit fields, include trailing commas, hallucinate enum values, or change the schema between calls. Each of those errors must be caught and retried by hand, and the retry prompt must somehow communicate to the model exactly what went wrong without re-explaining the entire schema.[^7][^2]
Instructor's stated motivation is to remove that boilerplate. The library's documentation frames the problem in three parts: unreliability ("LLM outputs work 90% of the time, but failures are costly"), provider fragmentation (each vendor's function-calling surface looks slightly different), and the awkwardness of writing nested JSON schemas by hand instead of in idiomatic Python.[^7] Liu, in his original September 2023 blog post that introduced the project, argued that Pydantic was already an obvious bridge between Python developers and language models because Pydantic models double as both validators and self-documenting schemas, and he positioned Instructor as the equivalent of requests for structured LLM calls: a thin layer that handles the boring parts and stays out of the way otherwise.[^5][^8]
The result is that, in the canonical usage, a developer never sees a JSON schema string at all. They write a Pydantic BaseModel, pass it as response_model, and either receive an instance of that model or, if all retries are exhausted, a ValidationError they can catch like any other Python exception.[^2][^9]
A second, less obvious motivation is testability. Because the returned object is a real Python instance, application logic that consumes it can be unit-tested with synthetic Pydantic instances rather than mock JSON blobs, and contract changes in the schema surface as type errors at edit time. In practice this means LLM-using code can adopt the same review and linting practices as ordinary backend code, which was difficult when the contract between application and model was an unstructured string.[^2][^7]
Liu was an ML engineer at Stitch Fix for five years before taking a sabbatical at South Park Commons in New York and moving into full-time AI consulting. He has described initially dismissing language models as impractical and only returning to them after a wrist injury and the post-2022 release of GPT-3.5 made the technology unavoidable.[^8]
The first commits of Instructor were written, by Liu's own account, on the Shinkansen during a trip to Japan in June 2023, shortly after OpenAI introduced function calling for the GPT-3.5 and GPT-4 chat models.[^8] At the time, alternative libraries such as Guardrails, Marvin, and the earliest versions of LangChain tended to use XML-style annotations, custom prompt templates, or full agent frameworks to coerce structure. Liu's design instead treated function calling as a JSON-schema delivery mechanism and used Pydantic as the schema source. The library's first public form was distributed as the package openai_function_call before being renamed.[^5]
Adoption grew quickly: by late 2023 Instructor had moved from a personal repository under Liu's jxnl username to the instructor-ai organization, and again later to 567-labs/instructor. A TypeScript port, instructor-js, was started in 2023 by Liu together with Dimitri Kennedy and reached v1.7.0 by January 2025.[^4] Community-maintained ports followed in Ruby, Go, Elixir, and Rust.[^1]
Major Python milestones include the v1.0 release in early 2024, which introduced the instructor.from_openai(OpenAI()) factory style and deprecated the global monkey-patching instructor.patch() call in favor of explicit per-client wrapping; the gradual addition of multimodal helpers (Image, Audio, PDF) across 2024 and 2025; partial-streaming and iterable-streaming primitives; provider-specific extras for Anthropic, Cohere, Google GenAI and Vertex, Mistral, Bedrock, and Groq; and an internal hook system that exposes events for completion start, completion end, retry, and parse error. Release v1.13.0 added the py.typed marker so that downstream users get proper type inference, and v1.14 added a uniform provider factory and broadened Bedrock support. By v1.15.1 (3 April 2026), the library supported xAI, additional security hardening for Bedrock image inputs, retry tracking inside hooks, and an Anthropic User-Agent header.[^10][^3]
The naming of the package has also evolved: early users will recall it being known as openai_function_call, then briefly as openai-function-call-pydantic, before the rename to instructor cemented after the OpenAI rebrand made the original name misleading. The migration from instructor.patch() to factory functions was driven by user reports that monkey-patching at module import time interacted badly with applications that maintain multiple OpenAI clients in the same process (for example, one client for embeddings and one for chat with different timeouts). Wrapping a specific client instance avoids that global-state pitfall.[^5][^9]
In the modern style, an Instructor client is just a regular provider client that has been wrapped:
import instructor
from openai import OpenAI
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
client = instructor.from_openai(OpenAI())
user = client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=[{"role": "user", "content": "Jason is 25."}],
)
# user is now a fully validated User instance
The same shape works for other providers via dedicated factories (instructor.from_anthropic, instructor.from_cohere, instructor.from_mistral, instructor.from_google, etc.), or via the unified instructor.from_provider("openai/gpt-4o") string that selects a provider and model in a single call.[^2][^9][^11] The underlying client retains its normal type signature, which means existing tools such as IDE autocompletion, retry decorators, and request logging continue to work.
Because every provider exposes structured output through a slightly different mechanism, Instructor introduces a Mode enum that selects the wire-level technique. The most common modes are:
| Mode | Mechanism | Typical providers |
|---|---|---|
Mode.TOOLS | Native function/tool calling | OpenAI, Anthropic, Mistral, Cohere, Groq |
Mode.JSON | "JSON mode" with post-hoc parsing | OpenAI, DeepSeek, Together, vLLM endpoints |
Mode.JSON_SCHEMA | Vendor-supplied JSON Schema enforcement | OpenAI (Structured Outputs), Gemini |
Mode.MD_JSON | Markdown-fenced JSON block, regex-extracted | Ollama, older local models |
Mode.PARALLEL_TOOLS | Multiple tool calls in one response | OpenAI, Anthropic; auto-selected when response_model is Iterable[Union[...]] |
The chosen mode determines how the Pydantic schema is serialised into the underlying provider request and how validation errors are translated back into a "reask" turn.[^11][^12]
response_model accepts any Pydantic BaseModel subclass, including models with nested submodels, lists, dictionaries, Literal enums, Union types, generics, and Pydantic's full Field(...) constraint vocabulary (gt, lt, min_length, regex patterns, custom validators). Two type wrappers are provided for streaming, Partial[Model] and Iterable[Model], described below.[^2][^9]
Field validators are first-class citizens. Because Pydantic v2 supports arbitrary @field_validator and @model_validator callables, application authors can encode business rules ("phone number must match an E.164 regex," "end date must be after start date") and Instructor will treat any ValidationError they raise the same as a missing field: it will package the error message into the next prompt and retry. This is the mechanism that gives Instructor its de-facto "self-healing" reputation.[^9][^11]
The simplest retry interface is an integer:
user = client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
max_retries=3,
messages=[...],
)
When validation fails on attempt N, Instructor constructs an additional turn that includes both the offending output and the ValidationError text, then resubmits. The reask turn is short and targeted, so the marginal token cost of a retry is small relative to the original generation.[^9]
For richer behaviour max_retries also accepts a Tenacity Retrying (sync) or AsyncRetrying (async) object, allowing callers to combine schema-level retries with exponential backoff, rate-limit-aware waits, jitter, or stop-after-deadline policies. The Instructor documentation recommends combining its built-in max_retries integer with Tenacity decorators only when network-level resilience is also needed, and always setting an explicit stop condition to avoid infinite loops.[^9]
Since v1.10, Instructor exposes a hook system that fires callbacks on completion:kwargs, completion:response, completion:error, parse:error, and completion:last_attempt. Hooks make it straightforward to log raw API traffic to observability platforms, attribute token usage to retries, or short-circuit retries on specific exceptions.[^10]
def on_parse_error(error, **kwargs):
metrics.increment("instructor.parse_error", tags={"model": kwargs["model"]})
client.on("parse:error", on_parse_error)
Hooks are also a convenient place to record the full request and response payloads for evaluation datasets, since both pre-retry and post-retry artifacts pass through the same handlers. The v1.12 release expanded the hook surface so that retry counts and per-attempt latency are attached to the hook context, simplifying instrumentation.[^10]
Two streaming primitives ship in the box.
create_partial() returns a generator that yields successive partial states of the response model as tokens arrive. Internally Instructor rewrites the user-supplied BaseModel so that all fields are Optional, and as each field is fully decoded it switches from None to the concrete value. Application code can iterate the generator and re-render a UI on every yielded snapshot:[^13]
class MeetingInfo(BaseModel):
title: str
attendees: list[str]
starts_at: datetime
stream = client.chat.completions.create_partial(
response_model=MeetingInfo,
messages=[{"role": "user", "content": "Schedule a 9 a.m. retro with Anna and Bo."}],
)
for snapshot in stream:
render(snapshot) # snapshot is a MeetingInfo with possibly-None fields
create_iterable() is the dual for "extract many objects." It is typed as Iterable[Model], switches the underlying call to parallel tool calling when the provider supports it, and yields validated instances one at a time as the model emits them. This is the typical pattern for entity extraction over documents.[^13][^11]
Two caveats apply to partial streaming: Pydantic validators cannot run mid-stream because the model is still incomplete, and Literal types need to be marked with the PartialLiteralMixin so that intermediate values are accepted. Async iteration is supported via async for on the partial generator.[^13]
Instructor offers provider-agnostic wrappers for non-text inputs so that the same Pydantic model can be reused across OpenAI, Anthropic, Google GenAI, Mistral, and Bedrock without rewriting the message construction.[^14]
instructor.Image accepts URLs, Google Cloud Storage URLs, local file paths, and base64 strings via class methods from_url, from_gs_url, from_path, from_base64, and an autodetect heuristic. It is supported across OpenAI, Anthropic, and Google GenAI. The library can also be configured with autodetect_images=True so that any string looking like a path or URL inside a message is converted automatically.[^14]
instructor.Audio exposes the same five constructors but is restricted to OpenAI (for the GPT-4o audio inputs) and Gemini.[^14]
instructor.PDF has the widest coverage (OpenAI, Anthropic, Google GenAI, Mistral, Bedrock) and ships two specialised subclasses: PDFWithCacheControl, which adds Anthropic prompt-caching metadata so that a repeatedly-used document only pays its upload cost once, and PDFWithGenaiFile, which uploads via Google's Files API and references the uploaded file in the prompt.[^14]
A typical multimodal extraction reads:
from instructor import Image
from pydantic import BaseModel
class Receipt(BaseModel):
merchant: str
total_usd: float
line_items: list[str]
receipt = client.chat.completions.create(
model="gpt-4o-mini",
response_model=Receipt,
messages=[
{"role": "user", "content": [
"Extract this receipt:",
Image.from_path("receipt.jpg"),
]},
],
)
The same code, with client = instructor.from_anthropic(...), works against Claude without modification.[^14][^11]
Instructor advertises support for 15-plus providers; the integrations page enumerates more than 20 by mid-2026.[^6]
| Category | Providers |
|---|---|
| Frontier API providers | OpenAI, Anthropic, Google Gemini, Google GenAI, Vertex AI, xAI |
| Enterprise clouds | Azure OpenAI, Amazon Bedrock, Writer, SambaNova |
| Specialty inference | Groq, Cerebras, Fireworks AI, Together AI |
| Other commercial models | Cohere, Mistral AI, DeepSeek, Perplexity |
| Local and self-hosted | Ollama, llama-cpp-python, vLLM (via OpenAI-compatible endpoint) |
| Routers | LiteLLM, OpenRouter |
In practice the easiest way to use any of these is instructor.from_provider("provider/model"); the library reads the provider segment, imports the relevant SDK (which must already be installed via the matching extra such as instructor[anthropic] or instructor[google-genai]), picks a default Mode, and returns a wrapped client.[^15][^6] Every integration supports response_model, validation-feedback retries, hooks, and at least basic streaming; some, notably Anthropic, also expose provider-specific options such as the thinking extended-reasoning parameter, prompt caching, and parallel tool calling.[^11]
Although Instructor's public surface is small, its internal pipeline has several distinct stages, and understanding them helps explain why the library is positioned as a thin wrapper rather than a framework.[^11][^2]
response_model is bound to a BaseModel subclass, Instructor calls Pydantic's model_json_schema() and post-processes the result so that it conforms to provider-specific quirks. OpenAI tool schemas require additionalProperties: false and disallow some keywords; Anthropic's tools accept a richer schema but reject certain oneOf constructs; Gemini has its own JSON-schema dialect. Each provider plugin owns its translation.Mode (TOOLS, JSON, JSON_SCHEMA, MD_JSON, PARALLEL_TOOLS) determines whether the schema is sent as a tool definition, as a response_format, or as part of the system prompt.temperature, top_p, seed, Anthropic thinking, and Gemini safety settings all flow through unchanged.Model.model_validate_json(). If validation succeeds, the validated instance is returned. If validation raises a ValidationError, control jumps to the retry layer.user (or in some modes tool and assistant chained) turn containing the previous output and the formatted error messages, and reissues the call. The retry budget is governed by max_retries.Because each stage is independent, advanced users can replace individual pieces. For example, a custom Mode plugin can be registered for a provider that is not yet first-party supported, or the reask construction can be overridden to inject a different prompt template.[^11][^9]
Liu has repeatedly framed Instructor as targeting three application archetypes:[^8]
Iterable[Edge] or similar, then upserting nodes and edges into a graph store. The combination of create_iterable with parallel tool calling makes this throughput-bound rather than logic-bound.Beyond Liu's framing, common community uses include classification (a Literal[...] field with a tight enum), structured chain-of-thought (a reasoning field followed by a final_answer field, both validated), evaluators (Pydantic models that capture rubric scores), and as the structured-data engine inside larger agent loops where each tool's input and output are Pydantic models. Observability vendors including Langfuse have shipped first-class integrations that record each Instructor call along with its retries and validation errors.[^16]
A representative entity-extraction workflow demonstrates several of these patterns at once:
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Iterable, Literal
class Mention(BaseModel):
surface_form: str
canonical_name: str
kind: Literal["person", "org", "location"]
confidence: float = Field(ge=0.0, le=1.0)
client = instructor.from_openai(OpenAI())
mentions: Iterable[Mention] = client.chat.completions.create_iterable(
model="gpt-4o-mini",
response_model=Mention,
max_retries=2,
messages=[
{"role": "system", "content": "Extract named entities. One Mention per call."},
{"role": "user", "content": open("article.txt").read()},
],
)
for m in mentions:
upsert_entity(m)
A handful of design decisions are visible in this example. The Literal type narrows the LLM's choice of kind to three values, and a confidence field bounded by Pydantic's ge/le constraint provides a uniform calibration signal. Because create_iterable switches the underlying call to parallel tool calling, each Mention is yielded as soon as it is parsed rather than after the entire document has been processed, which substantially reduces tail latency on long inputs.[^11][^13]
The structured-output ecosystem splits into two design families. Post-generation validators let the model emit tokens freely and then validate the result, retrying with feedback on failure; pre-generation constrainers modify token sampling so that only schema-conformant tokens can be produced. Instructor is the canonical post-generation validator.[^17][^18]
| Library | Approach | Validation engine | Provider coverage | Typical use case |
|---|---|---|---|---|
| Instructor | Post-generation, with retries | Pydantic | 20+ via SDK wrapping | Generic structured extraction, multi-provider apps |
LangChain with_structured_output | Post-generation | Pydantic or JSON Schema | All LangChain-supported providers | Existing LangChain pipelines, agents |
| Pydantic AI | Post-generation, with agents | Pydantic | OpenAI, Anthropic, Gemini, Groq, Bedrock, others | Lightweight agents that also need structured outputs |
| Outlines | Pre-generation, constrained decoding | Pydantic, regex, CFG | Transformers, vLLM, llama.cpp; limited OpenAI | Local models where 100% valid output is required |
| Guidance | Pre-generation, programmatic prompts | Custom DSL plus Python control flow | Transformers, llama.cpp, OpenAI (limited) | Workflows that interleave model output with Python branching |
| BAML | Code-generation from .baml schema files | Compiled validator | OpenAI, Anthropic, others | Polyglot teams that want a single schema source |
The trade-offs are reasonably well established in practitioner write-ups:
transformers library and has only partial OpenAI support. Instructor reverses the trade-off: it works with any provider that supports tool-use or JSON mode, at the cost of occasionally needing to retry.[^17][^18]with_structured_output. LangChain bundles structured output into a much larger framework that also handles chains, retrievers, agents, and callbacks. For teams already using LangChain, with_structured_output is a natural choice. For teams that want only structured output, Instructor is a much smaller dependency and stays closer to the underlying provider SDK.[^18][^17].baml schema files into typed client libraries in multiple languages. It is the only entrant that is not a Python library at runtime; the comparison axis is roughly "single source of truth in a dedicated DSL" versus "single source of truth in Python types."[^17]Liu has been candid that Instructor's positioning relies on staying small. His repeated message to interviewers has been that he is not building a framework; he is building "the boring stuff" so that downstream framework authors do not need to.[^8]
Instructor's design has well-understood limits.
Retry cost. Because validation happens after generation, every validation failure costs at least one extra round-trip to the model. For high-volume pipelines on expensive frontier models, this can dominate the bill. Practitioners report that for sufficiently complex schemas a fraction of calls require one or two retries, and pre-generation tools such as Outlines avoid that overhead entirely at the price of running locally.[^17][^18]
Schema complexity ceiling. Very deeply nested Pydantic models translate into very large JSON schemas, which in turn consume input tokens on every call. Some providers also impose hard limits on schema depth or on the number of properties in a tool definition. In those cases the schema needs to be flattened, or split into multiple calls.
Mode coverage gaps. Not every provider supports every mode. Local models served through Ollama frequently fall back to Mode.MD_JSON, which is regex-based and brittle if the model emits anything outside the fenced block. Some providers' JSON Schema implementations do not honour every Pydantic constraint, so validation that nominally happens at the provider becomes a no-op and Instructor's client-side validator is the only safety net.[^11][^17]
Streaming validators. Partial streaming intentionally skips validators because intermediate states are by definition incomplete. Applications that rely on validators for correctness must run them on the final yielded snapshot, not on intermediate ones.[^13]
Migration friction. The transition from instructor.patch() (global monkey patching) to instructor.from_openai() (explicit wrapping) and then to instructor.from_provider() (string-based) introduced two waves of community example code that no longer reflects current best practice. Users following older tutorials sometimes pick up patterns that the modern docs explicitly deprecate.[^9]
Not a sampling-time guarantee. A model that produces 200 tokens of nonsense before the retry budget is exhausted is still 200 tokens of nonsense. Instructor cannot prevent the model from confabulating field values that happen to satisfy the schema; Pydantic validators and downstream business logic remain necessary for semantic correctness.
Latency. Each retry inflates total wall-clock time. For latency-sensitive surfaces such as interactive copilots, applications often cap max_retries at one and fall back to a graceful error rather than risk a slow first byte. Streaming with create_partial partially mitigates the perceived latency by letting the UI render fields as they decode, but it does not reduce the upper bound on total time when the schema is large.[^13]
Provider drift. Vendor APIs change. When OpenAI introduced its "Structured Outputs" feature with strict JSON Schema enforcement in 2024, Instructor added a corresponding mode but had to accommodate the fact that the strict schema dialect rejects certain Pydantic constructs (such as Union types without discriminators). Similar small frictions appear whenever a provider extends or restricts its tool-call surface, and the burden of keeping up falls on the library's maintainers.[^12][^11]
By mid-2026 Instructor reports roughly 3 million monthly downloads, 13,000-plus GitHub stars, and over 100 contributors across the Python repository alone.[^1][^2][^3] The library and its docs are maintained under the 567-labs GitHub organisation; Liu also runs an associated cohort-based course on applied LLM systems and a paid consulting practice, both of which have served as informal feedback channels for API design.[^8]
The ecosystem includes the TypeScript port instructor-js (also MIT, maintained by Liu and Dimitri Kennedy, latest tag v1.7.0 in January 2025) and community ports in Ruby, Go, Elixir, and Rust that share the same conceptual API.[^4][^1] Observability integrations exist for Langfuse and other tracing platforms, and Instructor calls are first-class in DSPy-style pipelines when developers want to mix a structured-output layer into a larger prompt-optimisation workflow.[^16]