# Outlines (library)

> Source: https://aiwiki.ai/wiki/outlines_library
> Updated: 2026-07-16
> Categories: Developer Tools, Large Language Models, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Outlines** is an open-source [Python (programming language)](/wiki/python) library, released under the Apache 2.0 license, that constrains [large language model](/wiki/large_language_model) output to user-specified structures: regular expressions, [function](/wiki/function_calling) signatures, JSON schemas, [Python](/wiki/python) dataclasses or Pydantic models, and context-free grammars expressed in EBNF or Lark notation.[^1][^2] The project is developed by the Paris-based start-up .txt (legal name Dottxt, repository `dottxt-ai/outlines`) and implements the algorithm introduced in Brandon T. Willard and Rémi Louf's July 2023 paper "Efficient Guided Generation for Large Language Models" (arXiv:2307.09702).[^3] Rather than parsing or retrying model output after it has been produced, Outlines compiles the target structure into a finite-state machine (FSM) and uses that FSM to mask invalid [logits](/wiki/logits) at every decoding step, so the model can only emit tokens that keep the output inside the allowed language.[^3][^4] The library supports a wide range of inference back-ends including [Hugging Face Transformers](/wiki/transformers_library), [vLLM](/wiki/vllm), [llama.cpp](/wiki/llama_cpp), [SGLang](/wiki/sglang), [MLX](/wiki/mlx), [Ollama](/wiki/ollama), and remote APIs from [OpenAI](/wiki/openai), [Anthropic](/wiki/anthropic), [Google Gemini](/wiki/gemini) and [Mistral AI](/wiki/mistral).[^1][^5]

## Infobox

| Attribute | Value |
|---|---|
| Project name | Outlines |
| Repository | `dottxt-ai/outlines` |
| First public release | 2023 (initial commit by Brandon Willard) |
| Stable 1.0 release | June 18, 2025 |
| Current release | 1.3.0 (May 13, 2026) |
| License | Apache 2.0 |
| Primary language | Python (with Rust core via `outlines-core`) |
| Underlying algorithm | Willard and Louf, arXiv:2307.09702 (July 2023) |
| Maintainer | .txt (Dottxt SAS), Paris, France |
| Founders | Rémi Louf, Brandon Willard, Dan Gerlanc |
| Funding raised | $11.9 million (pre-seed and seed, 2023 to 2024) |

Sources: GitHub `dottxt-ai/outlines`,[^1] PyPI page,[^2] arXiv 2307.09702,[^3] Hugging Face blog on `outlines-core`,[^4] TechCrunch coverage of .txt's funding round.[^6]

## Background and motivation

By mid 2023, developers building applications on top of [large language models](/wiki/large_language_model) had identified the unreliability of free-form text output as a major engineering obstacle. Models trained with next-token prediction could be coaxed by prompt engineering to produce JSON, but they routinely emitted invalid syntax, hallucinated extra fields, omitted required keys, or interleaved natural-language commentary with the structured payload.[^3][^7] The usual workaround at the time was a generate-then-validate loop: ask the model for JSON, run a parser, and re-prompt on failure. This added latency, token cost and brittleness to every pipeline that consumed model output, and it gave no upper bound on the number of retries.

Brandon Willard and Rémi Louf, two researchers who had previously worked together on Bayesian and probabilistic programming tooling at the New York start-up [Normal Computing](/wiki/normal_computing), encountered this problem while building information-extraction systems on top of GPT-4 and open weights models.[^6] Their proposed alternative, published as arXiv preprint 2307.09702 on 19 July 2023, was to move the constraint inside the decoder rather than outside it.[^3] If the target output language can be expressed as a regular expression or context-free grammar, then at every generation step there is a deterministically computable set of vocabulary tokens that keep the partial output inside that language. Masking the model's [logits](/wiki/logits) so that only those tokens have non-zero probability guarantees that the final string parses, by construction, without changing the underlying model.[^3]

The first public version of the Outlines library was published shortly after the paper, and the project quickly grew into the reference open-source implementation of this approach. By the time .txt announced its seed funding in October 2024, the open-source library had been downloaded more than three million times in total and around 600,000 times in the preceding month.[^6] In August 2024, OpenAI launched its own "Structured Outputs" feature in the API and explicitly credited Outlines, jsonformer, instructor and Microsoft's Guidance as inspirations for the approach.[^8]

## History and milestones

The project's timeline is straightforward to reconstruct from the arXiv preprint history, the GitHub release log and contemporaneous press coverage.

| Date | Event |
|---|---|
| 19 July 2023 | First version of Willard and Louf, "Efficient Guided Generation for Large Language Models" (arXiv:2307.09702 v1) posted.[^3] |
| 12 August 2023 | arXiv revision v3 with extended sections on context-free grammars.[^3] |
| 19 August 2023 | arXiv revision v4, the version most commonly cited.[^3] |
| December 2023 | .txt announces a $3.2 million pre-seed round led by Paris venture firm Elaia.[^6] |
| 6 August 2024 | OpenAI launches Structured Outputs in its API, naming Outlines as an inspiration.[^8] |
| 7 October 2024 | Release of `outlines` 0.1.0 with unified `outlines.processors` architecture, 98% reduction in per-token runtime overhead, and the first vision-input support.[^9] |
| 17 October 2024 | TechCrunch reports .txt's combined $11.9 million funding to date, including an $8.7 million seed led by EQT Ventures.[^6] |
| 22 October 2024 | Release of `outlines-core` 0.1.0, a Rust port of the core indexing algorithm developed jointly by .txt and [Hugging Face](/wiki/hugging_face).[^4] |
| 18 June 2025 | Stable 1.0.0 release of `outlines` introduces `from_<backend>` factory functions, an `Application` class, and adds first-class integrations for SGLang, TGI, multimodal Transformers, [Anthropic](/wiki/anthropic), [Gemini](/wiki/gemini) and .txt's own hosted API.[^10] |
| 13 May 2026 | Current stable release `outlines` 1.3.0 standardizes exception handling across remote backends.[^2][^11] |

The library reached version 1.0 by stripping rather than expanding scope: in the same release, the explicit `Sampler` classes and `outlines.generate` module were deprecated in favor of letting users pass inference arguments through to the underlying serving library, on the grounds that structured-generation logic should live alongside the [logit](/wiki/logits) processor and not duplicate the host inference engine's sampling code.[^10]

## How it works

### Reformulating decoding as state-machine traversal

The paper's central insight is to treat the partial output of an autoregressive language model as a string being read by a finite-state automaton. If the desired output is described by a regular expression `R`, one first compiles `R` into a finite-state machine `M(R)` using standard techniques (Thompson's construction followed by subset construction to obtain a deterministic FSM). Each state `q` of `M(R)` corresponds to a class of partial outputs that have read a prefix consistent with `R`. After every token the model emits, the system advances `M(R)` over the characters of that token; if the FSM enters a dead state, the token is illegal.[^3]

To turn this into an efficient decoder, the algorithm precomputes an index `σ` that maps each FSM state `q` to the subset of the model's [vocabulary](/wiki/logits) whose surface form, when concatenated to the prefix represented by `q`, keeps `M(R)` alive. Building `σ` involves walking each vocabulary token through the FSM and is done once per (vocabulary, regex) pair. At decoding time, the implementation looks up `σ[q]` for the current FSM state `q`, sets the logits of all other tokens to negative infinity, samples normally from the masked distribution, and advances `q` according to the chosen token. Willard and Louf describe this lookup as having effectively constant amortized cost per generated token, so the FSM-guided decoder adds negligible runtime overhead relative to ordinary autoregressive generation.[^3]

The asymmetry matters: although the index can in principle be of size proportional to the number of FSM states times the size of the vocabulary, in practice not every vocabulary string is accepted by any given FSM, and many states accept the same subset of tokens. The paper notes that the resulting tables can be post-processed so that many FSM states share pointers to the same vocabulary subset, which keeps the memory footprint manageable even for large vocabularies of 100,000-plus tokens.[^3]

### Extending to context-free grammars

A pure regular expression is not expressive enough to describe nested or recursive structures such as balanced parentheses, well-formed JSON, or arithmetic expressions. The paper handles these cases by augmenting the FSM with a stack, producing a pushdown automaton (PDA) that accepts the target context-free language.[^3] Outlines exposes this via two surface APIs. The `CFG` term accepts a grammar written in EBNF, and `outlines` ships with a JSON-grammar and ships several example grammars; alternatively, the user can supply a Lark grammar, which `outlines` lowers to the same PDA-based guide.[^9][^12] Because constructing a full PDA-derived index is more expensive than the regex case, grammar-based generation remains a beta-quality feature with slower compilation times than the regex path.[^9][^13]

### JSON Schema and Pydantic

The most common use of Outlines is to constrain output to a JSON Schema or to a [Python](/wiki/python) Pydantic model. Internally, the library translates the schema into a regular expression that recognises every string that the schema would accept, then proceeds with the regex-FSM pipeline described above. The high-level surface API is `Generator(model, PydanticModel)` in the 1.x line and `outlines.generate.json(model, PydanticModel)` in the legacy 0.x line.[^10][^12] When a Pydantic model is supplied, the generator returns an instantiated model object whose attributes are typed and validated; when a JSON Schema string is supplied, it returns a dictionary.[^12] Function signatures are also accepted, in which case the structure is inferred from the function's type-annotated parameters and the result can be unpacked with `**` directly into a call.[^12]

### The Rust core

In October 2024 .txt and [Hugging Face](/wiki/hugging_face) released `outlines-core` 0.1.0, a separate Rust crate that implements the vocabulary, index, and guide primitives used by `outlines`. The motivations were threefold: ahead-of-time compilation eliminates the JIT warm-up cost of the previous Numba-based path, the language's static type system removes a class of memory bugs that had occurred in the Python core, and a small Rust crate is easier to embed in non-Python runtimes such as JavaScript or Swift.[^4] The Hugging Face announcement reported approximately a 2x improvement in average index-compilation speed against the previous Python implementation and described the design as "a lightweight library that's easier to maintain and to integrate into other projects."[^4] The Rust crate is exposed to Python through `pyo3` bindings, and the higher-level `outlines` package now delegates index construction, vocabulary handling and guide traversal to `outlines-core` while keeping user-facing helpers, model integrations, and prompt templating in Python.[^4][^14]

### Coalescence and other optimisations

A blog post from .txt titled "Coalescence: making LLM inference 5x faster" describes an additional optimisation enabled by the FSM representation.[^15] Tokenizers in modern [LLMs](/wiki/large_language_model) are non-injective: the same surface string can be produced by many different token sequences. For example, the string `name` can be emitted by eight different byte-pair-encoded token paths. Inside an FSM-guided decoder one can detect when, regardless of which legal token is sampled next, the FSM ends up in the same state, and in that case the longest legal token can simply be appended without invoking the model. The post reports that on a JSON schema example this collapses nine model calls down to two, yielding "at least a 5x speedup over structured generation in Outlines" while preserving the distributional behaviour of the sampler.[^15]

## Implementations and integrations

Outlines positions itself as a backend-agnostic constraint layer rather than as an inference engine in its own right. The full list of supported backends as of the 1.3 release on PyPI is:[^2][^10]

| Category | Backends |
|---|---|
| Local Python inference | [Hugging Face Transformers](/wiki/transformers_library) (including the multimodal variant), [llama.cpp](/wiki/llama_cpp) via `llama-cpp-python`, [MLX-LM](/wiki/mlx), ExLlamaV2 |
| Self-hosted servers | [vLLM](/wiki/vllm), [SGLang](/wiki/sglang), Hugging Face Text Generation Inference, LM Studio, [Ollama](/wiki/ollama) |
| Remote APIs | [OpenAI](/wiki/openai) (including any OpenAI-compatible server), [Anthropic](/wiki/anthropic), [Gemini](/wiki/gemini), [Mistral AI](/wiki/mistral), .txt's hosted `api.dottxt.ai` |
| Other constraint backends | `xgrammar`, `llguidance` (exposed as alternative engines for advanced users) |

Two integrations are particularly important to the wider ecosystem.

[vLLM](/wiki/vllm) originally shipped Outlines as one of its supported structured-output backends and exposed the constraint through OpenAI-compatible request parameters such as `guided_json`, `guided_regex`, `guided_choice` and `guided_grammar`. In its 2026 documentation, vLLM lists `xgrammar` and Microsoft's `guidance` as the default and recommended backends, has deprecated the `guided_*` parameters in favour of a single `structured_outputs` request object as of release v0.12.0, and notes that all three backends (xgrammar, guidance and outlines) use Rust-style regex syntax.[^16] Outlines therefore remains an available backend in vLLM but is no longer the default.

[Ollama](/wiki/ollama) gained Outlines-powered vision structured output in version 1.2.10 of `outlines` in February 2025, which also added LM Studio integration and dropped Python 3.9 support.[^11]

Beyond direct library imports, [Nvidia](/wiki/nvidia) NIM, AWS Marketplace (via .txt's SageMaker images) and Hugging Face's inference platform all describe Outlines-based structured generation as a supported deployment mode.[^17][^4]

## The .txt company and commercial product

The library's primary maintainer is the Paris-based start-up Dottxt SAS, doing business as ".txt." The company was founded in 2023 by Rémi Louf (chief executive), Brandon Willard (chief scientist) and Dan Gerlanc, after the three had collaborated at [Normal Computing](/wiki/normal_computing) in New York on probabilistic-modelling tooling.[^6] In December 2023 it closed a $3.2 million pre-seed round led by Elaia with participation from Seedcamp, Common Magic, Kima, FJ Labs and angels including Roxanne Varza (Station F), Erik Bernhardsson, Julien Chaumond (co-founder and chief technology officer of [Hugging Face](/wiki/hugging_face)) and Bob van Luijt (Weaviate).[^6] In August 2024 it added an $8.7 million seed round led by EQT Ventures, bringing the total raised to $11.9 million.[^6] The team grew from eight people in mid 2024 to seventeen by the end of October 2024 according to TechCrunch coverage.[^6]

The commercial product is described on the company website as a combination of three layers.[^5] First, a hosted REST API at `api.dottxt.ai` offers pay-per-token constrained decoding with the same guarantees as the library. Second, drop-in replacements for the [vLLM](/wiki/vllm), [SGLang](/wiki/sglang) and TensorRT-LLM inference servers are sold for self-hosted deployments. Third, a set of composable client libraries (`dotjson` for JSON-schema enforcement, `dotgrammar` for context-free grammars, `dotlambda` for [function calling](/wiki/function_calling)) is intended to provide a higher-level building block than the open-source `outlines` package.[^5] The .txt site cites endorsements from named users including [Nvidia](/wiki/nvidia), [Cohere](/wiki/cohere) and [Hugging Face](/wiki/hugging_face) and reports that the open-source Outlines project has accumulated more than 65 million PyPI downloads.[^5]

The relationship between the company and the open-source library follows the model used by many other open-core infrastructure start-ups: the upstream library remains Apache 2.0 with a permissive contribution policy, while .txt sells a hosted runtime and proprietary integrations on top. Senior contributors to the open-source project are .txt employees, and the company's blog at `blog.dottxt.ai` is the primary venue for technical write-ups about the underlying algorithms, including the coalescence post.[^15] The Rust core was developed jointly with [Hugging Face](/wiki/hugging_face) engineers and released under the same Apache 2.0 license as the rest of the project.[^4][^14]

## Comparison with related projects

Outlines is part of a broader generation of structured-output libraries that emerged between 2023 and 2025. The most-cited points of comparison are Microsoft's Guidance, the LMQL language from ETH Zürich, the Pydantic-validation-plus-retry library Instructor, and the more recent XGrammar from CMU and MIT-IBM.

| Library | Primary mechanism | License | First public release | Distinctive feature |
|---|---|---|---|---|
| Outlines | FSM/PDA-guided logit masking compiled from regex, JSON Schema or grammar | Apache 2.0 | July 2023 (after arXiv:2307.09702) | Backend-agnostic constraint with no client-side retries; ships a Rust core[^1][^3][^4] |
| Guidance | Programmatic templating language with embedded constraints, also enforced at decoding time | MIT | 2023 (Microsoft Research) | Treats the prompt and the constraint as a single program with interleaved generation and control flow[^18] |
| LMQL | Domain-specific query language for LMs with logical constraints | Apache 2.0 | 2023 (ETH Zürich) | Declarative SQL-like syntax for prompt + constraint pairs[^19] |
| Instructor | Pydantic-typed wrapper that calls the model, validates and retries | MIT | 2023 | Works on any chat-completion API without modifying the decoder; relies on retries rather than guarantees[^20] |
| XGrammar | Vocabulary-partitioned grammar decoder | Apache 2.0 | November 2024 (arXiv:2411.15100) | Reported up to two orders of magnitude faster grammar masking than earlier approaches by partitioning the vocabulary into context-independent and context-dependent subsets[^21] |

A 2025 benchmark study, "Generating Structured Outputs from Language Models: Benchmark and Studies" (arXiv:2501.10868), evaluated six implementations (Outlines, Guidance, llama.cpp's GBNF, XGrammar, OpenAI's Structured Outputs and Gemini's structured output API) on three axes: compilation efficiency, JSON Schema feature coverage on 10,000 real-world schemas, and downstream task quality on Last Letter, Shuffle Objects and GSM8K.[^7] The paper reports that Outlines had higher grammar-compilation latency than Guidance and llama.cpp (3 to 12 seconds in some cases versus near-zero), that Guidance had broader empirical schema coverage on the GitHub-Easy slice (86% versus Outlines' 59%), and that constrained decoding generally improved downstream accuracy by 1 to 3 percentage points compared to free-form generation.[^7]

A separate 2024 NeurIPS paper introducing XGrammar (arXiv:2411.15100) reported similar coverage gaps and argued that partitioning the vocabulary into a context-independent subset (decidable purely from the next grammar terminal) and a much smaller context-dependent subset enables grammar-guided decoding at near-zero per-token overhead, in some configurations more than 100x faster than Outlines on JSON-schema masking.[^21] vLLM, [SGLang](/wiki/sglang) and TensorRT-LLM all subsequently adopted XGrammar as a built-in or default backend, while Outlines retained its position as a more flexible library that is decoupled from any particular serving engine.[^16][^21]

The differences are not strictly comparable. Instructor and similar Pydantic-plus-retry tools operate after generation, work over any commercial API and never modify the decoder, so they trade lower guarantees for broader compatibility; Outlines and XGrammar both intervene at the logit step and therefore require access to the per-step logit distribution, which is not exposed by every commercial model provider.[^20][^21]

## Applications

Because Outlines guarantees that output parses, it has been adopted as a building block in a number of recurring use cases:[^1][^5][^17]

- **JSON extraction from documents.** Pipelines that turn unstructured text or images into structured records (invoice line items, clinical reports, regulatory filings) gain a hard guarantee that the consumer schema is satisfied. .txt's AWS Marketplace deployment example uses a [DeepSeek-R1-Distill](/wiki/deepseek_r1_distill) [Qwen](/wiki/qwen) 32B model packaged with Outlines to extract medical-record fields under a fixed JSON schema.[^17]
- **[Function calling](/wiki/function_calling) and [tool use](/wiki/tool_use).** Local and open-weights models without native function-calling fine-tuning can be made to emit syntactically valid function invocations by constraining them to a JSON-schema or grammar representation of the available tools.[^12]
- **Multiple choice and classification.** When the desired output is one of a fixed set of strings, Outlines guarantees the output is exactly one of those strings by masking everything else, removing the need to parse the model's free-form response.[^12]
- **Synthetic data generation.** Because the output strictly matches a schema, Outlines can be used to generate large synthetic datasets that are guaranteed to be loadable into a downstream training pipeline.[^4]
- **Multimodal extraction.** From release 0.1.0 onwards, Outlines supports vision plus text inputs, enabling structured extraction directly from images using vision-language models on Transformers, [Ollama](/wiki/ollama) and similar backends.[^9][^11]
- **Evaluation and benchmarking.** By forcing models to emit answers in a fixed schema (for example, a single integer for arithmetic tasks or a single token for multiple-choice tasks) Outlines reduces the sensitivity of benchmark scores to prompt formatting and to the model's tendency to add chatty preamble before the answer.[^4][^7]
- **Code and DSL emission.** Lark and EBNF grammar support lets a model emit syntactically valid SQL, regular expressions, or domain-specific languages directly, rather than producing free text that has to be repaired before it can be executed.[^9][^13]

## Limitations and criticisms

Outlines does not change the underlying competence of the language model. If the unconstrained model would not have known the correct answer, masking the [logits](/wiki/logits) to a smaller alphabet still cannot manufacture it; constrained decoding only ensures syntactic validity, not semantic correctness, a point the paper makes explicitly.[^3] Several practical limitations have been raised by users and benchmark studies.

- **Compilation latency.** For complex regex or JSON schemas, building the vocabulary index can take several seconds, which is significant if the schema changes per request. The 0.1.0 release reduced this overhead substantially and the move to a Rust core in `outlines-core` further halved it on average, but it remains larger than the near-zero compilation time reported for Guidance or `llama.cpp`'s GBNF in head-to-head benchmarks.[^4][^7]
- **Schema feature coverage.** The 2025 benchmark study reported that on the GitHub-Easy slice of 10,000 JSON schemas Outlines achieved 59% empirical coverage versus 86% for Guidance, attributing the gap to generation timeouts on certain schema features rather than to algorithmic incorrectness.[^7] XGrammar's authors made a similar observation in their 2024 paper, motivating their vocabulary-partition approach.[^21]
- **Quality preservation.** Naively masking [logits](/wiki/logits) to single-byte tokens can change the implicit length distribution of the output because token boundaries are non-injective; the coalescence optimisation described in the .txt blog is in part a way to keep the original distribution while still skipping deterministic prefixes.[^15]
- **Backend coupling.** Because Outlines needs access to per-step logits, it cannot enforce guarantees on closed APIs that only return final text. On [OpenAI](/wiki/openai), [Anthropic](/wiki/anthropic) and [Gemini](/wiki/gemini) backends, Outlines therefore falls back to the providers' native structured-output features rather than running the FSM itself.[^10]
- **Default position in [vLLM](/wiki/vllm).** As of vLLM's 2026 documentation, the default structured-output backend is `xgrammar` and `guidance` is the recommended alternative, with Outlines listed but no longer the default; the previous `guided_json` family of request parameters has been deprecated in favour of a unified `structured_outputs` field.[^16]

## Significance

Independently of any particular benchmark ranking, the Outlines paper is the work most consistently credited with reframing structured generation as an FSM-masking problem. [OpenAI](/wiki/openai)'s August 2024 launch of Structured Outputs in its API explicitly cited Outlines as one of the open-source projects from which "we drew inspiration" alongside `jsonformer`, `instructor` and Microsoft's Guidance.[^8] Subsequent research projects, including XGrammar (arXiv:2411.15100) and the systematic benchmark study by Geng et al. (arXiv:2501.10868), use Outlines as the canonical baseline against which more recent grammar decoders are measured.[^7][^21] As of May 2026 the upstream `outlines` package reports more than 65 million downloads on PyPI, [Nvidia](/wiki/nvidia), [Cohere](/wiki/cohere) and [Hugging Face](/wiki/hugging_face) among its named users, and 87 GitHub releases on the way to the current 1.3.0 line.[^1][^2][^5]

The library also illustrates a broader pattern in 2023 to 2026 LLM tooling, where engineering reliability is achieved not by larger models or better prompts but by reshaping the decoding step itself. In this sense Outlines sits alongside [speculative decoding](/wiki/speculative_decoding) and tokenizer-aware sampling as one of several decoder-side techniques that became standard infrastructure in production [large language model](/wiki/large_language_model) serving stacks during this period.[^16][^21]

## See also

- [vLLM](/wiki/vllm)
- [SGLang](/wiki/sglang)
- [llama.cpp](/wiki/llama_cpp)
- [Hugging Face Transformers](/wiki/transformers_library)
- [Ollama](/wiki/ollama)
- [MLX](/wiki/mlx)
- [Function calling](/wiki/function_calling)
- [Tool use](/wiki/tool_use)
- [Logits](/wiki/logits)
- [Decoding strategies](/wiki/decoding_strategies)
- [Speculative Decoding](/wiki/speculative_decoding)
- [Normal Computing](/wiki/normal_computing)
- [Hugging Face](/wiki/hugging_face)
- [LangChain](/wiki/langchain)

## References

[^1]: dottxt-ai, "outlines: Structured Outputs", GitHub repository, accessed 2026-05-20. https://github.com/dottxt-ai/outlines. Accessed 2026-05-20.

[^2]: Outlines Developers, "outlines · PyPI", Python Package Index, project page (current version 1.3.0, 2026-05-13). https://pypi.org/project/outlines/. Accessed 2026-05-20.

[^3]: Brandon T. Willard and Rémi Louf, "Efficient Guided Generation for Large Language Models", arXiv:2307.09702 (versions v1 2023-07-19 through v4 2023-08-19). https://arxiv.org/abs/2307.09702. Accessed 2026-05-20.

[^4]: Brandon T. Willard and Rémi Louf (.txt) with Hugging Face, "Releasing Outlines-core 0.1.0: structured generation in Rust and Python", Hugging Face blog, 2024-10-22. https://huggingface.co/blog/outlines-core. Accessed 2026-05-20.

[^5]: .txt, ".txt: New Rules for AI" (company website, products page), Dottxt SAS, accessed 2026-05-20. https://dottxt.ai/. Accessed 2026-05-20.

[^6]: Romain Dillet, "With $11.9 million in funding, Dottxt tells AI models how to answer", TechCrunch, 2024-10-17. https://techcrunch.com/2024/10/17/with-11-9-million-in-funding-dottxt-tells-ai-models-how-to-answer/. Accessed 2026-05-20.

[^7]: Saibo Geng et al., "Generating Structured Outputs from Language Models: Benchmark and Studies", arXiv:2501.10868, 2025-01-18. https://arxiv.org/abs/2501.10868. Accessed 2026-05-20.

[^8]: OpenAI, "Introducing Structured Outputs in the API", OpenAI blog, 2024-08-06. https://openai.com/index/introducing-structured-outputs-in-the-api/. Accessed 2026-05-20.

[^9]: dottxt-ai, "Release Outlines v0.1.0", GitHub release, 2024-10-07. https://github.com/dottxt-ai/outlines/releases/tag/0.1.0. Accessed 2026-05-20.

[^10]: dottxt-ai, "Release Outlines v1.0.0", GitHub release, 2025-06-18. https://github.com/dottxt-ai/outlines/releases/tag/1.0.0. Accessed 2026-05-20.

[^11]: dottxt-ai, "Releases · dottxt-ai/outlines", GitHub releases page (v1.2.10 2025-02-06 through v1.3.0 2026-05-13). https://github.com/dottxt-ai/outlines/releases. Accessed 2026-05-20.

[^12]: Outlines Documentation, "JSON (function calling)", official documentation, dottxt-ai.github.io. https://dottxt-ai.github.io/outlines/reference/generation/json/. Accessed 2026-05-20.

[^13]: Outlines Documentation, "Welcome to Outlines!" (project overview), dottxt-ai.github.io. https://dottxt-ai.github.io/outlines/welcome/. Accessed 2026-05-20.

[^14]: dottxt-ai, "outlines-core: Faster structured generation", GitHub repository. https://github.com/dottxt-ai/outlines-core. Accessed 2026-05-20.

[^15]: Rémi Louf and Brandon T. Willard, "Coalescence: making LLM inference 5x faster", .txt blog, 2024. https://blog.dottxt.ai/coalescence.html. Accessed 2026-05-20.

[^16]: vLLM Project, "Structured Outputs", vLLM documentation (current 2026 release, deprecates `guided_*` request parameters in v0.12.0). https://docs.vllm.ai/en/latest/features/structured_outputs.html. Accessed 2026-05-20.

[^17]: Lokesh Sharma et al., "Generate structured output from LLMs with Dottxt Outlines in AWS", AWS Machine Learning Blog. https://aws.amazon.com/blogs/machine-learning/generate-structured-output-from-llms-with-dottxt-outlines-in-aws/. Accessed 2026-05-20.

[^18]: Microsoft Research, "guidance: control LM output", Microsoft Research project page. https://www.microsoft.com/en-us/research/project/guidance-control-lm-output/. Accessed 2026-05-20.

[^19]: Luca Beurer-Kellner, Marc Fischer and Martin Vechev, "Prompting Is Programming: A Query Language for Large Language Models" (LMQL), arXiv:2212.06094, 2022-12-12. https://arxiv.org/abs/2212.06094. Accessed 2026-05-20.

[^20]: Jason Liu, "Instructor: Structured outputs powered by LLMs", project documentation. https://python.useinstructor.com/. Accessed 2026-05-20.

[^21]: Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao and Tianqi Chen, "XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models", arXiv:2411.15100, 2024-11-22. https://arxiv.org/abs/2411.15100. Accessed 2026-05-20.