Outlines (library)
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,077 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,077 words
Add missing citations, update stale details, or suggest a clearer explanation.
Outlines is an open-source Python (programming language) library, released under the Apache 2.0 license, that constrains large language model output to user-specified structures: regular expressions, function signatures, JSON schemas, Python dataclasses or Pydantic models, and context-free grammars expressed in EBNF or Lark notation.[^1][^2] The project is developed by the Paris-based start-up .txt (legal name Dottxt, repository dottxt-ai/outlines) and implements the algorithm introduced in Brandon T. Willard and Rémi Louf's July 2023 paper "Efficient Guided Generation for Large Language Models" (arXiv:2307.09702).[^3] Rather than parsing or retrying model output after it has been produced, Outlines compiles the target structure into a finite-state machine (FSM) and uses that FSM to mask invalid logits at every decoding step, so the model can only emit tokens that keep the output inside the allowed language.[^3][^4] The library supports a wide range of inference back-ends including Hugging Face Transformers, vLLM, llama.cpp, SGLang, MLX, Ollama, and remote APIs from OpenAI, Anthropic, Google Gemini and Mistral AI.[^1][^5]
| Attribute | Value |
|---|---|
| Project name | Outlines |
| Repository | dottxt-ai/outlines |
| First public release | 2023 (initial commit by Brandon Willard) |
| Stable 1.0 release | June 18, 2025 |
| Current release | 1.3.0 (May 13, 2026) |
| License | Apache 2.0 |
| Primary language | Python (with Rust core via outlines-core) |
| Underlying algorithm | Willard and Louf, arXiv:2307.09702 (July 2023) |
| Maintainer | .txt (Dottxt SAS), Paris, France |
| Founders | Rémi Louf, Brandon Willard, Dan Gerlanc |
| Funding raised | $11.9 million (pre-seed and seed, 2023 to 2024) |
Sources: GitHub dottxt-ai/outlines,[^1] PyPI page,[^2] arXiv 2307.09702,[^3] Hugging Face blog on outlines-core,[^4] TechCrunch coverage of .txt's funding round.[^6]
By mid 2023, developers building applications on top of large language models had identified the unreliability of free-form text output as a major engineering obstacle. Models trained with next-token prediction could be coaxed by prompt engineering to produce JSON, but they routinely emitted invalid syntax, hallucinated extra fields, omitted required keys, or interleaved natural-language commentary with the structured payload.[^3][^7] The usual workaround at the time was a generate-then-validate loop: ask the model for JSON, run a parser, and re-prompt on failure. This added latency, token cost and brittleness to every pipeline that consumed model output, and it gave no upper bound on the number of retries.
Brandon Willard and Rémi Louf, two researchers who had previously worked together on Bayesian and probabilistic programming tooling at the New York start-up Normal Computing, encountered this problem while building information-extraction systems on top of GPT-4 and open weights models.[^6] Their proposed alternative, published as arXiv preprint 2307.09702 on 19 July 2023, was to move the constraint inside the decoder rather than outside it.[^3] If the target output language can be expressed as a regular expression or context-free grammar, then at every generation step there is a deterministically computable set of vocabulary tokens that keep the partial output inside that language. Masking the model's logits so that only those tokens have non-zero probability guarantees that the final string parses, by construction, without changing the underlying model.[^3]
The first public version of the Outlines library was published shortly after the paper, and the project quickly grew into the reference open-source implementation of this approach. By the time .txt announced its seed funding in October 2024, the open-source library had been downloaded more than three million times in total and around 600,000 times in the preceding month.[^6] In August 2024, OpenAI launched its own "Structured Outputs" feature in the API and explicitly credited Outlines, jsonformer, instructor and Microsoft's Guidance as inspirations for the approach.[^8]
The project's timeline is straightforward to reconstruct from the arXiv preprint history, the GitHub release log and contemporaneous press coverage.
| Date | Event |
|---|---|
| 19 July 2023 | First version of Willard and Louf, "Efficient Guided Generation for Large Language Models" (arXiv:2307.09702 v1) posted.[^3] |
| 12 August 2023 | arXiv revision v3 with extended sections on context-free grammars.[^3] |
| 19 August 2023 | arXiv revision v4, the version most commonly cited.[^3] |
| December 2023 | .txt announces a $3.2 million pre-seed round led by Paris venture firm Elaia.[^6] |
| 6 August 2024 | OpenAI launches Structured Outputs in its API, naming Outlines as an inspiration.[^8] |
| 7 October 2024 | Release of outlines 0.1.0 with unified outlines.processors architecture, 98% reduction in per-token runtime overhead, and the first vision-input support.[^9] |
| 17 October 2024 | TechCrunch reports .txt's combined $11.9 million funding to date, including an $8.7 million seed led by EQT Ventures.[^6] |
| 22 October 2024 | Release of outlines-core 0.1.0, a Rust port of the core indexing algorithm developed jointly by .txt and Hugging Face.[^4] |
| 18 June 2025 | Stable 1.0.0 release of outlines introduces from_<backend> factory functions, an Application class, and adds first-class integrations for SGLang, TGI, multimodal Transformers, Anthropic, Gemini and .txt's own hosted API.[^10] |
| 13 May 2026 | Current stable release outlines 1.3.0 standardizes exception handling across remote backends.[^2][^11] |
The library reached version 1.0 by stripping rather than expanding scope: in the same release, the explicit Sampler classes and outlines.generate module were deprecated in favor of letting users pass inference arguments through to the underlying serving library, on the grounds that structured-generation logic should live alongside the logit processor and not duplicate the host inference engine's sampling code.[^10]
The paper's central insight is to treat the partial output of an autoregressive language model as a string being read by a finite-state automaton. If the desired output is described by a regular expression R, one first compiles R into a finite-state machine M(R) using standard techniques (Thompson's construction followed by subset construction to obtain a deterministic FSM). Each state q of M(R) corresponds to a class of partial outputs that have read a prefix consistent with R. After every token the model emits, the system advances M(R) over the characters of that token; if the FSM enters a dead state, the token is illegal.[^3]
To turn this into an efficient decoder, the algorithm precomputes an index σ that maps each FSM state q to the subset of the model's vocabulary whose surface form, when concatenated to the prefix represented by q, keeps M(R) alive. Building σ involves walking each vocabulary token through the FSM and is done once per (vocabulary, regex) pair. At decoding time, the implementation looks up σ[q] for the current FSM state q, sets the logits of all other tokens to negative infinity, samples normally from the masked distribution, and advances q according to the chosen token. Willard and Louf describe this lookup as having effectively constant amortized cost per generated token, so the FSM-guided decoder adds negligible runtime overhead relative to ordinary autoregressive generation.[^3]
The asymmetry matters: although the index can in principle be of size proportional to the number of FSM states times the size of the vocabulary, in practice not every vocabulary string is accepted by any given FSM, and many states accept the same subset of tokens. The paper notes that the resulting tables can be post-processed so that many FSM states share pointers to the same vocabulary subset, which keeps the memory footprint manageable even for large vocabularies of 100,000-plus tokens.[^3]
A pure regular expression is not expressive enough to describe nested or recursive structures such as balanced parentheses, well-formed JSON, or arithmetic expressions. The paper handles these cases by augmenting the FSM with a stack, producing a pushdown automaton (PDA) that accepts the target context-free language.[^3] Outlines exposes this via two surface APIs. The CFG term accepts a grammar written in EBNF, and outlines ships with a JSON-grammar and ships several example grammars; alternatively, the user can supply a Lark grammar, which outlines lowers to the same PDA-based guide.[^9][^12] Because constructing a full PDA-derived index is more expensive than the regex case, grammar-based generation remains a beta-quality feature with slower compilation times than the regex path.[^9][^13]
The most common use of Outlines is to constrain output to a JSON Schema or to a Python Pydantic model. Internally, the library translates the schema into a regular expression that recognises every string that the schema would accept, then proceeds with the regex-FSM pipeline described above. The high-level surface API is Generator(model, PydanticModel) in the 1.x line and outlines.generate.json(model, PydanticModel) in the legacy 0.x line.[^10][^12] When a Pydantic model is supplied, the generator returns an instantiated model object whose attributes are typed and validated; when a JSON Schema string is supplied, it returns a dictionary.[^12] Function signatures are also accepted, in which case the structure is inferred from the function's type-annotated parameters and the result can be unpacked with ** directly into a call.[^12]
In October 2024 .txt and Hugging Face released outlines-core 0.1.0, a separate Rust crate that implements the vocabulary, index, and guide primitives used by outlines. The motivations were threefold: ahead-of-time compilation eliminates the JIT warm-up cost of the previous Numba-based path, the language's static type system removes a class of memory bugs that had occurred in the Python core, and a small Rust crate is easier to embed in non-Python runtimes such as JavaScript or Swift.[^4] The Hugging Face announcement reported approximately a 2x improvement in average index-compilation speed against the previous Python implementation and described the design as "a lightweight library that's easier to maintain and to integrate into other projects."[^4] The Rust crate is exposed to Python through pyo3 bindings, and the higher-level outlines package now delegates index construction, vocabulary handling and guide traversal to outlines-core while keeping user-facing helpers, model integrations, and prompt templating in Python.[^4][^14]
A blog post from .txt titled "Coalescence: making LLM inference 5x faster" describes an additional optimisation enabled by the FSM representation.[^15] Tokenizers in modern LLMs are non-injective: the same surface string can be produced by many different token sequences. For example, the string name can be emitted by eight different byte-pair-encoded token paths. Inside an FSM-guided decoder one can detect when, regardless of which legal token is sampled next, the FSM ends up in the same state, and in that case the longest legal token can simply be appended without invoking the model. The post reports that on a JSON schema example this collapses nine model calls down to two, yielding "at least a 5x speedup over structured generation in Outlines" while preserving the distributional behaviour of the sampler.[^15]
Outlines positions itself as a backend-agnostic constraint layer rather than as an inference engine in its own right. The full list of supported backends as of the 1.3 release on PyPI is:[^2][^10]
| Category | Backends |
|---|---|
| Local Python inference | Hugging Face Transformers (including the multimodal variant), llama.cpp via llama-cpp-python, MLX-LM, ExLlamaV2 |
| Self-hosted servers | vLLM, SGLang, Hugging Face Text Generation Inference, LM Studio, Ollama |
| Remote APIs | OpenAI (including any OpenAI-compatible server), Anthropic, Gemini, Mistral AI, .txt's hosted api.dottxt.ai |
| Other constraint backends | xgrammar, llguidance (exposed as alternative engines for advanced users) |
Two integrations are particularly important to the wider ecosystem.
vLLM originally shipped Outlines as one of its supported structured-output backends and exposed the constraint through OpenAI-compatible request parameters such as guided_json, guided_regex, guided_choice and guided_grammar. In its 2026 documentation, vLLM lists xgrammar and Microsoft's guidance as the default and recommended backends, has deprecated the guided_* parameters in favour of a single structured_outputs request object as of release v0.12.0, and notes that all three backends (xgrammar, guidance and outlines) use Rust-style regex syntax.[^16] Outlines therefore remains an available backend in vLLM but is no longer the default.
Ollama gained Outlines-powered vision structured output in version 1.2.10 of outlines in February 2025, which also added LM Studio integration and dropped Python 3.9 support.[^11]
Beyond direct library imports, Nvidia NIM, AWS Marketplace (via .txt's SageMaker images) and Hugging Face's inference platform all describe Outlines-based structured generation as a supported deployment mode.[^17][^4]
The library's primary maintainer is the Paris-based start-up Dottxt SAS, doing business as ".txt." The company was founded in 2023 by Rémi Louf (chief executive), Brandon Willard (chief scientist) and Dan Gerlanc, after the three had collaborated at Normal Computing in New York on probabilistic-modelling tooling.[^6] In December 2023 it closed a $3.2 million pre-seed round led by Elaia with participation from Seedcamp, Common Magic, Kima, FJ Labs and angels including Roxanne Varza (Station F), Erik Bernhardsson, Julien Chaumond (co-founder and chief technology officer of Hugging Face) and Bob van Luijt (Weaviate).[^6] In August 2024 it added an $8.7 million seed round led by EQT Ventures, bringing the total raised to $11.9 million.[^6] The team grew from eight people in mid 2024 to seventeen by the end of October 2024 according to TechCrunch coverage.[^6]
The commercial product is described on the company website as a combination of three layers.[^5] First, a hosted REST API at api.dottxt.ai offers pay-per-token constrained decoding with the same guarantees as the library. Second, drop-in replacements for the vLLM, SGLang and TensorRT-LLM inference servers are sold for self-hosted deployments. Third, a set of composable client libraries (dotjson for JSON-schema enforcement, dotgrammar for context-free grammars, dotlambda for function calling) is intended to provide a higher-level building block than the open-source outlines package.[^5] The .txt site cites endorsements from named users including Nvidia, Cohere and Hugging Face and reports that the open-source Outlines project has accumulated more than 65 million PyPI downloads.[^5]
The relationship between the company and the open-source library follows the model used by many other open-core infrastructure start-ups: the upstream library remains Apache 2.0 with a permissive contribution policy, while .txt sells a hosted runtime and proprietary integrations on top. Senior contributors to the open-source project are .txt employees, and the company's blog at blog.dottxt.ai is the primary venue for technical write-ups about the underlying algorithms, including the coalescence post.[^15] The Rust core was developed jointly with Hugging Face engineers and released under the same Apache 2.0 license as the rest of the project.[^4][^14]
Outlines is part of a broader generation of structured-output libraries that emerged between 2023 and 2025. The most-cited points of comparison are Microsoft's Guidance, the LMQL language from ETH Zürich, the Pydantic-validation-plus-retry library Instructor, and the more recent XGrammar from CMU and MIT-IBM.
| Library | Primary mechanism | License | First public release | Distinctive feature |
|---|---|---|---|---|
| Outlines | FSM/PDA-guided logit masking compiled from regex, JSON Schema or grammar | Apache 2.0 | July 2023 (after arXiv:2307.09702) | Backend-agnostic constraint with no client-side retries; ships a Rust core[^1][^3][^4] |
| Guidance | Programmatic templating language with embedded constraints, also enforced at decoding time | MIT | 2023 (Microsoft Research) | Treats the prompt and the constraint as a single program with interleaved generation and control flow[^18] |
| LMQL | Domain-specific query language for LMs with logical constraints | Apache 2.0 | 2023 (ETH Zürich) | Declarative SQL-like syntax for prompt + constraint pairs[^19] |
| Instructor | Pydantic-typed wrapper that calls the model, validates and retries | MIT | 2023 | Works on any chat-completion API without modifying the decoder; relies on retries rather than guarantees[^20] |
| XGrammar | Vocabulary-partitioned grammar decoder | Apache 2.0 | November 2024 (arXiv:2411.15100) | Reported up to two orders of magnitude faster grammar masking than earlier approaches by partitioning the vocabulary into context-independent and context-dependent subsets[^21] |
A 2025 benchmark study, "Generating Structured Outputs from Language Models: Benchmark and Studies" (arXiv:2501.10868), evaluated six implementations (Outlines, Guidance, llama.cpp's GBNF, XGrammar, OpenAI's Structured Outputs and Gemini's structured output API) on three axes: compilation efficiency, JSON Schema feature coverage on 10,000 real-world schemas, and downstream task quality on Last Letter, Shuffle Objects and GSM8K.[^7] The paper reports that Outlines had higher grammar-compilation latency than Guidance and llama.cpp (3 to 12 seconds in some cases versus near-zero), that Guidance had broader empirical schema coverage on the GitHub-Easy slice (86% versus Outlines' 59%), and that constrained decoding generally improved downstream accuracy by 1 to 3 percentage points compared to free-form generation.[^7]
A separate 2024 NeurIPS paper introducing XGrammar (arXiv:2411.15100) reported similar coverage gaps and argued that partitioning the vocabulary into a context-independent subset (decidable purely from the next grammar terminal) and a much smaller context-dependent subset enables grammar-guided decoding at near-zero per-token overhead, in some configurations more than 100x faster than Outlines on JSON-schema masking.[^21] vLLM, SGLang and TensorRT-LLM all subsequently adopted XGrammar as a built-in or default backend, while Outlines retained its position as a more flexible library that is decoupled from any particular serving engine.[^16][^21]
The differences are not strictly comparable. Instructor and similar Pydantic-plus-retry tools operate after generation, work over any commercial API and never modify the decoder, so they trade lower guarantees for broader compatibility; Outlines and XGrammar both intervene at the logit step and therefore require access to the per-step logit distribution, which is not exposed by every commercial model provider.[^20][^21]
Because Outlines guarantees that output parses, it has been adopted as a building block in a number of recurring use cases:[^1][^5][^17]
Outlines does not change the underlying competence of the language model. If the unconstrained model would not have known the correct answer, masking the logits to a smaller alphabet still cannot manufacture it; constrained decoding only ensures syntactic validity, not semantic correctness, a point the paper makes explicitly.[^3] Several practical limitations have been raised by users and benchmark studies.
outlines-core further halved it on average, but it remains larger than the near-zero compilation time reported for Guidance or llama.cpp's GBNF in head-to-head benchmarks.[^4][^7]xgrammar and guidance is the recommended alternative, with Outlines listed but no longer the default; the previous guided_json family of request parameters has been deprecated in favour of a unified structured_outputs field.[^16]Independently of any particular benchmark ranking, the Outlines paper is the work most consistently credited with reframing structured generation as an FSM-masking problem. OpenAI's August 2024 launch of Structured Outputs in its API explicitly cited Outlines as one of the open-source projects from which "we drew inspiration" alongside jsonformer, instructor and Microsoft's Guidance.[^8] Subsequent research projects, including XGrammar (arXiv:2411.15100) and the systematic benchmark study by Geng et al. (arXiv:2501.10868), use Outlines as the canonical baseline against which more recent grammar decoders are measured.[^7][^21] As of May 2026 the upstream outlines package reports more than 65 million downloads on PyPI, Nvidia, Cohere and Hugging Face among its named users, and 87 GitHub releases on the way to the current 1.3.0 line.[^1][^2][^5]
The library also illustrates a broader pattern in 2023 to 2026 LLM tooling, where engineering reliability is achieved not by larger models or better prompts but by reshaping the decoding step itself. In this sense Outlines sits alongside speculative decoding and tokenizer-aware sampling as one of several decoder-side techniques that became standard infrastructure in production large language model serving stacks during this period.[^16][^21]