# Guidance (library)

> Source: https://aiwiki.ai/wiki/guidance_library
> Updated: 2026-07-16
> Categories: Developer Tools, Microsoft, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Guidance** is an open-source Python library, originally developed at [Microsoft Research](/wiki/microsoft_research), for building structured, multi-step programs that drive [large language models](/wiki/large_language_model). It lets a developer interleave ordinary Python control flow with model generation, and applies token-level constraints (regular expressions, context-free grammars, JSON schemas, choice sets) so that the output of an LLM is forced to conform to a desired structure. The project was created by Scott Lundberg (better known for [SHAP](/wiki/shap)) together with Marco Tulio Ribeiro and other Microsoft researchers, first appearing in 2022 as a Handlebars-style domain-specific language and re-released in late 2023 as an embedded Python API. Since January 2025, all of Guidance's grammar processing has been delegated to a Rust library called `llguidance`, also developed inside Microsoft Research, which computes valid-next-token masks in roughly 50 microseconds per step on a 128k-vocabulary tokenizer.[^1][^2][^3] Guidance is distributed under the MIT License from the `guidance-ai` GitHub organization and supports backends including the Hugging Face Transformers library, llama.cpp, ONNX Runtime GenAI, Azure AI, [OpenAI](/wiki/openai), and other commercial APIs.[^4][^5]

## Overview

| Field | Value |
| --- | --- |
| Original author | Scott Lundberg |
| Co-authors / maintainers | Marco Tulio Ribeiro, Harsha Nori, Richard Edgar, Michał Moskal, Hudson Cooper, Loc Huynh |
| Initial release | 0.0.1, November 11, 2022 (PyPI)[^4] |
| First Python rewrite | 0.1.0, November 14, 2023[^4][^6] |
| `llguidance` integration | 0.2.0, January 7, 2025[^7] |
| Latest stable release | 0.3.x series (2025 to 2026)[^4][^5] |
| Repository | github.com/guidance-ai/guidance |
| Companion grammar engine | github.com/guidance-ai/llguidance (Rust) |
| License | MIT |
| Language | Python (library API), Rust (grammar engine `llguidance`) |
| Supported backends | Hugging Face Transformers, llama.cpp, ONNX Runtime GenAI, Azure AI, OpenAI, Anthropic and Gemini via litellm, experimental SGLang[^5][^8] |
| Star count (GitHub) | More than 19,000 as of 2025[^9] |

Guidance is most often described as a "programming paradigm for steering language models". The official tagline on the Microsoft Research project page promises "100% guaranteed output structure, with 30 to 50% reduction in latency and costs" relative to plain prompting.[^9] The library is built around a model object that is treated as an immutable value: writing `lm += "some text"` or `lm += gen("name", max_tokens=20)` returns a new model whose state reflects the appended text or generated tokens, allowing programs to be composed and reused with the same predictability as ordinary functional code.[^6]

## History

### Origins at Microsoft Research (2022)

Guidance began inside Microsoft Research around 2022, where Scott Lundberg was a senior researcher and Marco Tulio Ribeiro was a principal researcher. Lundberg had previously created SHAP, a widely used framework for model interpretability based on Shapley values; Ribeiro is known for the LIME interpretability method and for the CheckList behavioral-testing methodology for NLP. The first PyPI release of `guidance` is dated November 11, 2022.[^4] Early versions exposed a templating language inspired by Handlebars, in which one would write something like `{{#select 'option'}}A{{or}}B{{/select}}` inside an otherwise normal prompt string. The interpreter walked the template token by token, calling the underlying model only when generation was required, and applied logits masking when the template constrained the legal next tokens.[^10]

The library was publicly announced on May 18, 2023 in a write-up covered by *The Register*, which described it as a domain-specific language resembling Handlebars, with linear code execution aligned to token order, controllable temperature, pattern matching constraints, and the ability to guarantee valid JSON output. In the same article Lundberg said, "with Guidance we can both accelerate inference speed and ensure that generated JSON is always valid". The same coverage cited 2x faster character generation on an NVIDIA RTX A6000 with a LLaMA-7B backend and improved accuracy on a BigBench task (76.01% vs. 63.04%) under guided execution.[^10]

### The "Guidance reborn" rewrite (November 2023)

In the summer of 2023 the project paused releases while the team rewrote the library. On November 14, 2023 Lundberg posted a discussion titled "Guidance reborn" and tagged version 0.1.0, which dropped the Handlebars-like DSL in favor of plain Python. He wrote that "all guidance programs are now pure Python programs. No more worrying about a distinction between 'user code' in Python and 'template code'".[^6] The new design centered on three ideas: (1) every program is a Python function operating on an immutable model object; (2) the surface syntax is a superset of regular expressions and context-free grammars so that grammars can be built up incrementally; and (3) state is carried explicitly inside the model object, which makes a guidance computation as composable as a pure function.[^6] This redesign is the version that most users encounter today and that the rest of the article describes.

### Transfer to the `guidance-ai` organization

The repository originally lived at `github.com/microsoft/guidance` but was moved to a community-maintained organization called `guidance-ai`. Issue trackers from May 2023 onward redirect from `microsoft/guidance` to `guidance-ai/guidance`, and the contact email listed on the PyPI page is `maintainers@guidance-ai.org`.[^11][^4] The original maintainers (Lundberg, Nori, Ribeiro, Edgar) remained on the project and a Microsoft contact alias (`guidanceai@microsoft.com`) appears in the repository README, but the codebase is no longer hosted under Microsoft's GitHub organization. Lundberg has since moved from Microsoft to Google DeepMind, where his research continues to focus on language models and explainability, while Nori and several other co-maintainers remain at Microsoft.[^12][^9]

### llguidance and the 0.2.0 release (January 2025)

The most significant evolution after the Python rewrite was the introduction of `llguidance`, a Rust library that took over the responsibility of computing the set of allowed next tokens for any given grammar. The 0.2.0 release announcement, posted on January 7, 2025 by Harsha Nori, said that "Guidance's core grammar processing has been fully migrated to the llguidance Rust library", that the new engine is "state of the art across frameworks", and that the release fixed "some key, subtle bugs in the earlier processing engine". The 0.2.0 changelog also expanded JSON schema coverage (handling `oneOf`, `required`, boolean schemas, numeric ranges, and broader `allOf` support), overhauled the in-line Jupyter visualizations, and made parser advancement run concurrently with the model's forward pass.[^7]

### Subsequent releases and recent additions

The 0.3.x series continued to broaden backend support and tighten performance. 0.3.0, released on September 9, 2025, added Groq and Mistral APIs via `litellm`, an experimental SGLang backend, OpenAI-style tool functions defined by JSON schemas, and support for new frontier models. 0.3.1 in early 2026 added an `onnxruntime-genai` backend, Python 3.14 compatibility, dropped Python 3.9, and introduced an inference-time monitor that performs semantic verification of generated text. 0.3.2 (March 2025) was a maintenance release that updated `llguidance` to 1.6.1 and added URI-format JSON string support.[^5] In parallel, the library acquired prototype multimodal support: pull request #1020, opened in September 2024, prototyped a `TransformersPhi3VisionEngine` and introduced `append_image`, `append_audio_bytes`, and `append_video_bytes` methods on the model class, with a placeholder convention (`<|_{modality.name}:{id}|>`) for embedding non-text blobs inside the prompt string. That specific PR was closed in May 2025 in favor of a replacement implementation by Hudson Cooper, but multimodal models such as Microsoft's Phi-3 Vision are now supported through the Transformers backend.[^13]

## How Guidance Works

### The immutable model object

A guidance program starts by constructing a model object that wraps a backend. Programs add to that object using the `+=` operator. For example, `lm = models.LlamaCpp("./model.gguf")` constructs a model, and `lm = lm + "The capital of France is " + gen("city", max_tokens=4)` produces a new model whose state is the prompt concatenated with up to four generated tokens, captured under the variable `city` and accessible as `lm["city"]`.[^6] Because the model is immutable, branching and backtracking, common in chain-of-thought style programs, are straightforward: the developer keeps multiple model objects in scope and continues whichever branch they need.

### Core primitives

The two most frequently used primitives are `gen()` and `select()`. `gen()` generates text into a named variable subject to optional constraints (regex pattern, stop strings, max tokens, temperature). `select()` forces the model to pick one of a closed list of options; for instance, `lm + "The answer is " + select(["yes", "no", "maybe"], name="answer")` forces a valid choice and is guaranteed never to hallucinate an out-of-vocabulary value.[^9][^14] In addition, the library exposes:

- **Regex constraints**: `gen(regex=r"\d{3}-\d{4}")` forces the output to match a phone-number-like pattern.
- **Context-free grammar constraints**: arbitrary CFGs can be expressed using stateless `@guidance` decorated functions or Lark-format grammars, and the resulting grammar is enforced at every step of decoding.[^2][^9]
- **JSON schema constraints**: the `json()` function accepts either a [Python](/wiki/python) dict schema or a Pydantic model and forces the model to emit a JSON document that satisfies it.[^7]
- **Tool functions**: OpenAI-compatible tool calls described by JSON schemas can be enforced declaratively rather than by parsing a model's free-form response.[^5]

### Token-level constraint enforcement

When a constraint is in effect, Guidance computes a token mask before each sampling step, telling the backend which tokens of the model's vocabulary are legal continuations. With a local backend (Transformers, llama.cpp, ONNX Runtime GenAI) the mask is applied directly to the logits, so the constraint is enforced exactly. With remote APIs that expose only token-level logit biases (or none at all), the enforcement is partial: some constraints can be expressed via the API's own structured-output endpoint, others fall back to retry-on-failure. The advantage of local enforcement is that constraint satisfaction is provable rather than statistical, and the library exploits structural constraints to "fast-forward" through tokens whose value is already implied by the grammar (for instance, the opening `{"name":` of a JSON object).[^9][^2]

### Token healing

A subtle issue in constrained decoding is the boundary between a fixed prompt and the next token to be generated. Greedy tokenizers split the prompt into tokens that may not align cleanly with the constraint; the most likely next token might actually start a few characters earlier than the prompt's end. Guidance addresses this with "token healing", which backs the generation pointer up by one or more tokens and constrains the first generated token to share a prefix with the truncated tail of the prompt. The result is that the program behaves as if the prompt were continued character by character, rather than token by token, which empirically improves generation quality on prefix-sensitive tokenizers.[^14]

### Stateless functions and composition

Programs in Guidance are typically written as ordinary Python functions decorated with `@guidance(stateless=True)`. A stateless function returns a grammar fragment rather than executing immediately, so it can be composed inside larger grammars. For instance, one can write a stateless function that yields a JSON object whose keys come from a fixed list and whose values are recursively generated; calling that function from inside another guidance program splices the grammar into the outer program. This makes grammars first-class values that can be inspected, tested against mock models, or shipped to a different backend without re-execution.[^9]

## llguidance: the grammar engine

### Motivation

Computing a token mask from a context-free grammar at every decoding step is, in principle, expensive: for a 128k-token vocabulary, a naive implementation would parse every candidate token against the grammar before each sampling step. Guidance's initial Python parser was correct but slow enough to dominate end-to-end generation time for non-trivial schemas. The Microsoft Research team (Michał Moskal, Harsha Nori, Hudson Cooper, Loc Huynh) wrote `llguidance` to fix this. According to the team's technical write-up, the library was developed between 2023 and 2025 and is now used not only by Guidance but also by llama.cpp, vLLM, SGLang, Chromium, mistral.rs, and Microsoft's onnxruntime-genai.[^3][^2]

### Algorithmic design

`llguidance` is implemented in Rust (about 86% of the code base, with thin Python and C bindings). It splits the constraint-checking job into two layers:[^3][^2]

1. A **regex-derivative-based lexer**, using the `derivre` library, that performs Brzozowski-derivative lazy DFA construction with negligible startup cost. The lexer can validate fast paths such as "we are currently inside a JSON string and any non-quote character is legal" without invoking the parser.
2. An **Earley parser** with low-level optimizations (32-bit integer item pairs, strategic row reuse during traversal) that handles the context-free portions of the grammar. In practice the parser is invoked on only 0.1 to 1% of tokens; the rest are accepted or rejected by the lexer.

To compute a mask, the engine traverses a prefix trie of the tokenizer's vocabulary. Each edge of the trie corresponds to appending a byte; the engine attempts to extend the current parse state with that byte and prunes whole subtrees when the parser rejects the prefix. A further "slicer" optimization precomputes masks for common regex slices (for example, the body of a JSON string) so that for the dense parts of the mask the trie traversal is skipped entirely. The team reports average mask computation around 50 microseconds per token on a 128k-token tokenizer and roughly 2 milliseconds of startup overhead, with 10 to 1000 times speedups over earlier libraries.[^2][^3]

### JSONSchemaBench

To evaluate `llguidance` and its peers, the team published *JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language Models*, by Saibo Geng, Hudson Cooper, Michał Moskal, Samuel Jenkins, Julian Berman, Nathan Ranchin, Robert West, Eric Horvitz, and Harsha Nori. The paper was first posted to arXiv on January 18, 2025 (with a final v3 dated February 27, 2025). It compares Guidance, Outlines, llamacpp, XGrammar, OpenAI structured outputs, and Gemini on 10,000 real-world JSON schemas, evaluating efficiency, coverage of constraint types, and output quality.[^15]

### Integration footprint

By 2025, `llguidance` had been integrated into several inference stacks beyond Guidance itself. OpenAI's structured outputs feature uses it as of May 2025, and llama.cpp, vLLM, SGLang, Chromium, mistral.rs, and Microsoft's onnxruntime-genai all expose it as a grammar backend.[^3][^2] In Guidance itself, the upgrade in 0.2.0 was transparent at the API level but produced large performance gains on grammar-heavy programs.[^7]

## Supported backends

Guidance is intentionally backend-agnostic. The same Python program can target a local model running through Hugging Face Transformers or llama.cpp, a model served through Microsoft's onnxruntime-genai, an Azure AI deployment, OpenAI's API, or a [Google DeepMind](/wiki/google_deepmind) Gemini endpoint via `litellm`.[^4][^5] In practice the level of constraint support depends on the backend:

| Backend | Logits access | Full grammar enforcement | Notes |
| --- | --- | --- | --- |
| Hugging Face Transformers | Yes | Yes | Best-supported local path; used for [vision language model](/wiki/vision_language_model) integration via `TransformersPhi3VisionEngine` and successors.[^13] |
| llama.cpp | Yes | Yes | Native llguidance integration; popular for quantized local inference.[^3][^4] |
| ONNX Runtime GenAI | Yes | Yes | Added in 0.3.x for optimized local inference on a broader range of hardware.[^5] |
| Azure AI | Partial | Partial | Constraints honored when the deployment exposes compatible structured-output options.[^4] |
| [OpenAI](/wiki/openai) | API-level structured outputs | Partial | OpenAI's structured outputs feature itself depends on llguidance, but logits-level enforcement is not directly exposed via the public API.[^3] |
| [Anthropic](/wiki/anthropic) / Gemini (via litellm) | No | No (best effort) | Used for tool-calling style programs; constraints fall back to retry on parse failure.[^5] |
| SGLang | Yes | Yes | Experimental backend added in 0.3.0; SGLang itself integrates llguidance internally.[^5] |

The recommended path for users who need strict guarantees is one of the local backends, where `llguidance` masks the logits directly. Remote APIs are useful when the developer is willing to accept partial enforcement or to combine Guidance with the provider's native structured-output mode.

## Applications

Guidance is commonly used for problems where prompting alone is unreliable or expensive:

- **Structured extraction**: parsing free-form text into JSON records that satisfy a schema, for example pulling addresses, prices, or medical codes out of source documents.[^9]
- **Code and DSL generation**: emitting [Python](/wiki/python) code, SQL, or YAML that is guaranteed to be syntactically valid because the grammar constrains decoding.[^9][^2]
- **Reasoning with constraints**: combining [in-context learning](/wiki/in-context_learning) examples with a constrained final answer (a `select` over a closed answer set), so the model can produce arbitrary chain-of-thought rationale but cannot hallucinate the final classification.[^14]
- **Tool calling**: declaring a tool's signature as a JSON schema and forcing the model's tool call to match, with the result that downstream code can dispatch directly on the parsed arguments without defensive parsing.[^5]
- **Multi-step agent workflows**: composing several `gen()` and `select()` calls inside a Python control flow so that, for instance, a planning step decides the next [tool use](/wiki/tool_use) from a closed set, an argument-generation step emits a strict JSON payload, and the result is fed back into another step, all within one process and with end-to-end constraint guarantees.[^9][^6]
- **Multimodal pipelines**: feeding images, audio, or video blobs to a [vision language model](/wiki/vision_language_model) such as Phi-3 Vision via the `append_image` and `append_audio_bytes` methods and then constraining the model's output to a structured format.[^13]

## Comparison with other libraries

Guidance sits in a small but active ecosystem of "LLM programming" or "structured generation" libraries. The libraries differ in what they optimize for: some focus on strict constrained decoding at the logits level, others on schema-driven validation, others on declarative prompt optimization.

| Library | First public release | Constraint mechanism | Optimization focus | Local backend support |
| --- | --- | --- | --- | --- |
| Guidance | 2022 (Microsoft / `guidance-ai`) | Token masks from regex, CFG, JSON schema via `llguidance`[^1][^2] | Interleave Python control with token-level constraints | [llama.cpp](/wiki/llama_cpp), Transformers, ONNX Runtime GenAI, [SGLang](/wiki/sglang)[^5] |
| LMQL | 2022 (ETH Zurich) | Logit masks via SQL-like LMP language | Declarative query language for LLMs | Limited; reported issues with batching and parallelism[^16] |
| Outlines | 2023 (.txt / Normal Computing) | Finite-state machine compiled from regex / Pydantic / CFG | Pure-Python constrained decoding | Transformers, llama.cpp, [vLLM](/wiki/vllm) |
| Instructor | 2023 (Jason Liu) | Schema-driven retries via Pydantic; uses provider structured-output endpoints | Pydantic-first ergonomics on top of existing APIs | None directly; depends on backend's structured-output mode[^16] |
| [DSPy](/wiki/dspy) | 2023 (Stanford NLP) | Prompt optimization (signatures, modules, optimizers) rather than logits constraints | Compiling prompts and few-shots automatically | Backend-agnostic |

In practice the choice depends on where the developer wants the constraint to live. Guidance and LMQL push the constraint down to the decoding step, so the constraint is provably satisfied at every token and the backend can fast-forward through structurally implied tokens. Outlines takes a similar approach but exposes a more Pythonic, schema-first API. Instructor takes the opposite approach, layering Pydantic validation and retries on top of an arbitrary provider. [DSPy](/wiki/dspy) is at a higher level still: rather than constrain tokens, it optimizes the prompts and few-shot demonstrations that surround the program.[^16] Guidance and [DSPy](/wiki/dspy) are sometimes used together, with DSPy authoring the prompts and Guidance enforcing the structure of the resulting completions.

## Limitations

Guidance's design has several honest weaknesses that the maintainers and outside reviewers acknowledge.

- **Backend coupling.** Full constraint enforcement requires direct access to the model's logits. APIs that do not expose this (most commercial chat APIs) can only be partially constrained, typically by leaning on the provider's own structured-output mode or by retrying on parse failure. Programs written for a local backend cannot always be moved to a remote API without losing guarantees.[^4][^5]
- **Batching and parallelism.** Independent reviewers have noted that Guidance historically lacked first-class batching and parallelism, while LMQL had its own throughput issues. Improvements landed in the 0.2.x and 0.3.x series, particularly running parser advancement concurrently with the model's forward pass, but Guidance is not a serving system; for high-throughput deployments developers typically combine `llguidance` with [vLLM](/wiki/vllm) or [SGLang](/wiki/sglang) directly.[^16][^7][^5]
- **Constraint mismatch.** A grammar that the user expects to be tight may, in practice, permit semantically wrong completions because the LLM still chooses among legal tokens by likelihood. JSONSchemaBench and follow-up work explicitly call out the difference between satisfying the syntactic constraint and producing semantically correct content, and 0.3.1 added an inference-time "monitor" step partly to address this.[^15][^5]
- **Multimodal limits.** Token healing and grammar forcing cannot cross the boundary between text tokens and multimodal placeholders, because the multimodal embeddings are not in token space. The first multimodal PR (Phi 3 Vision) was eventually replaced by an alternative implementation rather than merged, reflecting the difficulty of integrating non-text blobs into a grammar-driven decoding loop.[^13]
- **API churn.** The 2023 rewrite was a hard break with the original Handlebars-style DSL, which means that older tutorials, blog posts, and Stack Overflow answers can reference an API that no longer exists.[^6][^10]

## Significance

Guidance was one of the earliest libraries to popularize the idea that an LLM application is a *program* rather than a *prompt*, and that the language model's output should be controlled at the token level rather than coaxed via natural-language instructions. By providing both a Pythonic surface (the immutable model object) and a Rust-based grammar engine (`llguidance`) it occupies an unusual position: the same code that is convenient enough for notebook exploration also produces grammar-level guarantees suitable for production. The library's grammar engine has been adopted by major inference stacks including OpenAI's own structured outputs, llama.cpp, [vLLM](/wiki/vllm), [SGLang](/wiki/sglang), and onnxruntime-genai, so a substantial fraction of all "structured output" calls made through commercial and open-source LLM stacks now run through `llguidance` even when the user is not aware of it.[^3][^2]

Beyond engineering impact, Guidance influenced the broader conversation about how to do reliable prompting. The motivations articulated in the project (predictable structure, lower latency through token fast-forwarding, deterministic choices via `select`, no expensive retries or fine-tuning) reappear in adjacent ecosystems like [DSPy](/wiki/dspy) and in commercial structured-output features at major model providers.[^9][^16]

## Related Work

- [DSPy](/wiki/dspy): a declarative framework from Stanford NLP for programming and optimizing LLM pipelines; complementary to Guidance in that it focuses on prompt optimization rather than logits constraints.
- [SGLang](/wiki/sglang): an efficient runtime for structured language model programs; integrates `llguidance` natively.
- [vLLM](/wiki/vllm): a high-throughput LLM serving system that can use `llguidance` for guided decoding.
- [llama.cpp](/wiki/llama_cpp): a CPU/GPU C++ inference library that integrates `llguidance` as a grammar backend.
- [Function calling](/wiki/function_calling) and [tool use](/wiki/tool_use): the ergonomic problem Guidance addresses for many API users.
- [Structured output](/wiki/structured_output): the broader category of techniques (schema-driven validation, constrained decoding, retry-and-parse) of which Guidance is one prominent representative.

## See also

- [Microsoft Research](/wiki/microsoft_research)
- [Microsoft](/wiki/microsoft)
- [Google DeepMind](/wiki/google_deepmind)
- [SHAP (SHapley Additive exPlanations)](/wiki/shap)
- [Explainable AI](/wiki/explainable_ai)
- [Large Language Model](/wiki/large_language_model)
- [LLM](/wiki/llm)
- [Hugging Face Transformers](/wiki/transformers_library)
- [llama.cpp](/wiki/llama_cpp)
- [vLLM](/wiki/vllm)
- [SGLang](/wiki/sglang)
- [DSPy](/wiki/dspy)
- [Function calling](/wiki/function_calling)
- [Tool use](/wiki/tool_use)
- [Structured output](/wiki/structured_output)
- [Vision language model](/wiki/vision_language_model)
- [Phi-3](/wiki/phi_3)
- [Python (programming language)](/wiki/python)
- [In-context learning](/wiki/in-context_learning)
- [Prompt Engineering](/wiki/prompt_engineering)
- [MIT License](/wiki/mit_license)
- [OpenAI](/wiki/openai)
- [Anthropic](/wiki/anthropic)
- [Decoder](/wiki/decoder)
- [AI Code Generation](/wiki/ai_code_generation)

## References

[^1]: guidance-ai, "guidance: A guidance language for controlling large language models", GitHub README, 2026. https://github.com/guidance-ai/guidance. Accessed 2026-05-20.

[^2]: guidance-ai, "llguidance: Super-fast Structured Outputs", GitHub README, 2025. https://github.com/guidance-ai/llguidance. Accessed 2026-05-20.

[^3]: Michał Moskal, Harsha Nori, Hudson Cooper, Loc Huynh, "LLGuidance: Making Structured Outputs Go Brrr", guidance-ai.github.io technical write-up, 2025-06-11. https://guidance-ai.github.io/llguidance/llg-go-brrr. Accessed 2026-05-20.

[^4]: Guidance Maintainers, "guidance", PyPI package page, last updated 2026. https://pypi.org/project/guidance/. Accessed 2026-05-20.

[^5]: guidance-ai, "Releases - guidance-ai/guidance", GitHub releases page, 2026. https://github.com/guidance-ai/guidance/releases. Accessed 2026-05-20.

[^6]: Scott Lundberg, "Guidance reborn", guidance-ai/guidance Discussion #429, 2023-11-14. https://github.com/guidance-ai/guidance/discussions/429. Accessed 2026-05-20.

[^7]: Harsha Nori, "0.2.0 Release!", guidance-ai/guidance Discussion #1093, 2025-01-07. https://github.com/guidance-ai/guidance/discussions/1093. Accessed 2026-05-20.

[^8]: Microsoft Research, "guidance project page", Microsoft Research project listing, 2025. https://www.microsoft.com/en-us/research/project/guidance-control-lm-output/. Accessed 2026-05-20.

[^9]: Microsoft Research, "guidance: control LM output", Microsoft Research project page, 2025. https://www.microsoft.com/en-us/research/project/guidance-control-lm-output/. Accessed 2026-05-20.

[^10]: Thomas Claburn, "Microsoft proposes Guidance to tame large language models", The Register, 2023-05-18. https://www.theregister.com/2023/05/18/microsoft_guidance_project/. Accessed 2026-05-20.

[^11]: guidance-ai, "guidance issue tracker, formerly microsoft/guidance", GitHub issue redirect, 2023. https://github.com/microsoft/guidance/issues/158. Accessed 2026-05-20.

[^12]: Scott Lundberg, "Scott Lundberg personal site, projects: Guidance", scottlundberg.com, last updated 2025-06-27. https://scottlundberg.com/project/guidance/. Accessed 2026-05-20.

[^13]: nking-1 and Hudson Cooper, "Multimodal support with Phi 3 Vision + Transformers", guidance-ai/guidance pull request #1020, 2024-09 (closed 2025-05-21). https://github.com/guidance-ai/guidance/pull/1020. Accessed 2026-05-20.

[^14]: guidance-ai, "Token healing", Guidance documentation, latest. https://guidance.readthedocs.io/en/latest/example_notebooks/tutorials/token_healing.html. Accessed 2026-05-20.

[^15]: Saibo Geng, Hudson Cooper, Michał Moskal, Samuel Jenkins, Julian Berman, Nathan Ranchin, Robert West, Eric Horvitz, Harsha Nori, "JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language Models", arXiv:2501.10868, 2025-01-18 (v3 2025-02-27). https://arxiv.org/abs/2501.10868. Accessed 2026-05-20.

[^16]: Andrew Docherty, "Python libraries for LLM structured outputs beyond LangChain", Medium, 2024. https://medium.com/@docherty/python-libraries-for-llm-structured-outputs-beyond-langchain-621225e48399. Accessed 2026-05-20.