Guidance (library)

Developer Tools Microsoft Open Source AI

20 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

16 citations

Revision

v3 · 4,016 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Guidance is an open-source Python library, originally developed at Microsoft Research, for building structured, multi-step programs that drive large language models. It lets a developer interleave ordinary Python control flow with model generation, and applies token-level constraints (regular expressions, context-free grammars, JSON schemas, choice sets) so that the output of an LLM is forced to conform to a desired structure. The project was created by Scott Lundberg (better known for SHAP) together with Marco Tulio Ribeiro and other Microsoft researchers, first appearing in 2022 as a Handlebars-style domain-specific language and re-released in late 2023 as an embedded Python API. Since January 2025, all of Guidance's grammar processing has been delegated to a Rust library called llguidance, also developed inside Microsoft Research, which computes valid-next-token masks in roughly 50 microseconds per step on a 128k-vocabulary tokenizer.^[1]^[2]^[3] Guidance is distributed under the MIT License from the guidance-ai GitHub organization and supports backends including the Hugging Face Transformers library, llama.cpp, ONNX Runtime GenAI, Azure AI, OpenAI, and other commercial APIs.^[4]^[5]

Overview

Field	Value
Original author	Scott Lundberg
Co-authors / maintainers	Marco Tulio Ribeiro, Harsha Nori, Richard Edgar, Michał Moskal, Hudson Cooper, Loc Huynh
Initial release	0.0.1, November 11, 2022 (PyPI)^[4]
First Python rewrite	0.1.0, November 14, 2023^[4]^[6]
`llguidance` integration	0.2.0, January 7, 2025^[7]
Latest stable release	0.3.x series (2025 to 2026)^[4]^[5]
Repository	github.com/guidance-ai/guidance
Companion grammar engine	github.com/guidance-ai/llguidance (Rust)
License	MIT
Language	Python (library API), Rust (grammar engine `llguidance`)
Supported backends	Hugging Face Transformers, llama.cpp, ONNX Runtime GenAI, Azure AI, OpenAI, Anthropic and Gemini via litellm, experimental SGLang^[5]^[8]
Star count (GitHub)	More than 19,000 as of 2025^[9]

Guidance is most often described as a "programming paradigm for steering language models". The official tagline on the Microsoft Research project page promises "100% guaranteed output structure, with 30 to 50% reduction in latency and costs" relative to plain prompting.^[9] The library is built around a model object that is treated as an immutable value: writing lm += "some text" or lm += gen("name", max_tokens=20) returns a new model whose state reflects the appended text or generated tokens, allowing programs to be composed and reused with the same predictability as ordinary functional code.^[6]

History

Origins at Microsoft Research (2022)

Guidance began inside Microsoft Research around 2022, where Scott Lundberg was a senior researcher and Marco Tulio Ribeiro was a principal researcher. Lundberg had previously created SHAP, a widely used framework for model interpretability based on Shapley values; Ribeiro is known for the LIME interpretability method and for the CheckList behavioral-testing methodology for NLP. The first PyPI release of guidance is dated November 11, 2022.^[4] Early versions exposed a templating language inspired by Handlebars, in which one would write something like {{#select 'option'}}A{{or}}B{{/select}} inside an otherwise normal prompt string. The interpreter walked the template token by token, calling the underlying model only when generation was required, and applied logits masking when the template constrained the legal next tokens.^[10]

The library was publicly announced on May 18, 2023 in a write-up covered by The Register, which described it as a domain-specific language resembling Handlebars, with linear code execution aligned to token order, controllable temperature, pattern matching constraints, and the ability to guarantee valid JSON output. In the same article Lundberg said, "with Guidance we can both accelerate inference speed and ensure that generated JSON is always valid". The same coverage cited 2x faster character generation on an NVIDIA RTX A6000 with a LLaMA-7B backend and improved accuracy on a BigBench task (76.01% vs. 63.04%) under guided execution.^[10]

The "Guidance reborn" rewrite (November 2023)

In the summer of 2023 the project paused releases while the team rewrote the library. On November 14, 2023 Lundberg posted a discussion titled "Guidance reborn" and tagged version 0.1.0, which dropped the Handlebars-like DSL in favor of plain Python. He wrote that "all guidance programs are now pure Python programs. No more worrying about a distinction between 'user code' in Python and 'template code'".^[6] The new design centered on three ideas: (1) every program is a Python function operating on an immutable model object; (2) the surface syntax is a superset of regular expressions and context-free grammars so that grammars can be built up incrementally; and (3) state is carried explicitly inside the model object, which makes a guidance computation as composable as a pure function.^[6] This redesign is the version that most users encounter today and that the rest of the article describes.

Transfer to the `guidance-ai` organization

The repository originally lived at github.com/microsoft/guidance but was moved to a community-maintained organization called guidance-ai. Issue trackers from May 2023 onward redirect from microsoft/guidance to guidance-ai/guidance, and the contact email listed on the PyPI page is maintainers@guidance-ai.org.^[11]^[4] The original maintainers (Lundberg, Nori, Ribeiro, Edgar) remained on the project and a Microsoft contact alias (guidanceai@microsoft.com) appears in the repository README, but the codebase is no longer hosted under Microsoft's GitHub organization. Lundberg has since moved from Microsoft to Google DeepMind, where his research continues to focus on language models and explainability, while Nori and several other co-maintainers remain at Microsoft.^[12]^[9]

llguidance and the 0.2.0 release (January 2025)

The most significant evolution after the Python rewrite was the introduction of llguidance, a Rust library that took over the responsibility of computing the set of allowed next tokens for any given grammar. The 0.2.0 release announcement, posted on January 7, 2025 by Harsha Nori, said that "Guidance's core grammar processing has been fully migrated to the llguidance Rust library", that the new engine is "state of the art across frameworks", and that the release fixed "some key, subtle bugs in the earlier processing engine". The 0.2.0 changelog also expanded JSON schema coverage (handling oneOf, required, boolean schemas, numeric ranges, and broader allOf support), overhauled the in-line Jupyter visualizations, and made parser advancement run concurrently with the model's forward pass.^[7]

Subsequent releases and recent additions

The 0.3.x series continued to broaden backend support and tighten performance. 0.3.0, released on September 9, 2025, added Groq and Mistral APIs via litellm, an experimental SGLang backend, OpenAI-style tool functions defined by JSON schemas, and support for new frontier models. 0.3.1 in early 2026 added an onnxruntime-genai backend, Python 3.14 compatibility, dropped Python 3.9, and introduced an inference-time monitor that performs semantic verification of generated text. 0.3.2 (March 2025) was a maintenance release that updated llguidance to 1.6.1 and added URI-format JSON string support.^[5] In parallel, the library acquired prototype multimodal support: pull request #1020, opened in September 2024, prototyped a TransformersPhi3VisionEngine and introduced append_image, append_audio_bytes, and append_video_bytes methods on the model class, with a placeholder convention (<|_{modality.name}:{id}|>) for embedding non-text blobs inside the prompt string. That specific PR was closed in May 2025 in favor of a replacement implementation by Hudson Cooper, but multimodal models such as Microsoft's Phi-3 Vision are now supported through the Transformers backend.^[13]

How Guidance Works

The immutable model object

A guidance program starts by constructing a model object that wraps a backend. Programs add to that object using the += operator. For example, lm = models.LlamaCpp("./model.gguf") constructs a model, and lm = lm + "The capital of France is " + gen("city", max_tokens=4) produces a new model whose state is the prompt concatenated with up to four generated tokens, captured under the variable city and accessible as lm["city"].^[6] Because the model is immutable, branching and backtracking, common in chain-of-thought style programs, are straightforward: the developer keeps multiple model objects in scope and continues whichever branch they need.

Core primitives

The two most frequently used primitives are gen() and select(). gen() generates text into a named variable subject to optional constraints (regex pattern, stop strings, max tokens, temperature). select() forces the model to pick one of a closed list of options; for instance, lm + "The answer is " + select(["yes", "no", "maybe"], name="answer") forces a valid choice and is guaranteed never to hallucinate an out-of-vocabulary value.^[9]^[14] In addition, the library exposes:

Regex constraints: gen(regex=r"\d{3}-\d{4}") forces the output to match a phone-number-like pattern.
Context-free grammar constraints: arbitrary CFGs can be expressed using stateless @guidance decorated functions or Lark-format grammars, and the resulting grammar is enforced at every step of decoding.^[2]^[9]
JSON schema constraints: the json() function accepts either a Python dict schema or a Pydantic model and forces the model to emit a JSON document that satisfies it.^[7]
Tool functions: OpenAI-compatible tool calls described by JSON schemas can be enforced declaratively rather than by parsing a model's free-form response.^[5]

Token-level constraint enforcement

When a constraint is in effect, Guidance computes a token mask before each sampling step, telling the backend which tokens of the model's vocabulary are legal continuations. With a local backend (Transformers, llama.cpp, ONNX Runtime GenAI) the mask is applied directly to the logits, so the constraint is enforced exactly. With remote APIs that expose only token-level logit biases (or none at all), the enforcement is partial: some constraints can be expressed via the API's own structured-output endpoint, others fall back to retry-on-failure. The advantage of local enforcement is that constraint satisfaction is provable rather than statistical, and the library exploits structural constraints to "fast-forward" through tokens whose value is already implied by the grammar (for instance, the opening {"name": of a JSON object).^[9]^[2]

Token healing

A subtle issue in constrained decoding is the boundary between a fixed prompt and the next token to be generated. Greedy tokenizers split the prompt into tokens that may not align cleanly with the constraint; the most likely next token might actually start a few characters earlier than the prompt's end. Guidance addresses this with "token healing", which backs the generation pointer up by one or more tokens and constrains the first generated token to share a prefix with the truncated tail of the prompt. The result is that the program behaves as if the prompt were continued character by character, rather than token by token, which empirically improves generation quality on prefix-sensitive tokenizers.^[14]

Stateless functions and composition

Programs in Guidance are typically written as ordinary Python functions decorated with @guidance(stateless=True). A stateless function returns a grammar fragment rather than executing immediately, so it can be composed inside larger grammars. For instance, one can write a stateless function that yields a JSON object whose keys come from a fixed list and whose values are recursively generated; calling that function from inside another guidance program splices the grammar into the outer program. This makes grammars first-class values that can be inspected, tested against mock models, or shipped to a different backend without re-execution.^[9]

llguidance: the grammar engine

Motivation

Computing a token mask from a context-free grammar at every decoding step is, in principle, expensive: for a 128k-token vocabulary, a naive implementation would parse every candidate token against the grammar before each sampling step. Guidance's initial Python parser was correct but slow enough to dominate end-to-end generation time for non-trivial schemas. The Microsoft Research team (Michał Moskal, Harsha Nori, Hudson Cooper, Loc Huynh) wrote llguidance to fix this. According to the team's technical write-up, the library was developed between 2023 and 2025 and is now used not only by Guidance but also by llama.cpp, vLLM, SGLang, Chromium, mistral.rs, and Microsoft's onnxruntime-genai.^[3]^[2]

Algorithmic design

llguidance is implemented in Rust (about 86% of the code base, with thin Python and C bindings). It splits the constraint-checking job into two layers:^[3]^[2]

A regex-derivative-based lexer, using the derivre library, that performs Brzozowski-derivative lazy DFA construction with negligible startup cost. The lexer can validate fast paths such as "we are currently inside a JSON string and any non-quote character is legal" without invoking the parser.
An Earley parser with low-level optimizations (32-bit integer item pairs, strategic row reuse during traversal) that handles the context-free portions of the grammar. In practice the parser is invoked on only 0.1 to 1% of tokens; the rest are accepted or rejected by the lexer.

To compute a mask, the engine traverses a prefix trie of the tokenizer's vocabulary. Each edge of the trie corresponds to appending a byte; the engine attempts to extend the current parse state with that byte and prunes whole subtrees when the parser rejects the prefix. A further "slicer" optimization precomputes masks for common regex slices (for example, the body of a JSON string) so that for the dense parts of the mask the trie traversal is skipped entirely. The team reports average mask computation around 50 microseconds per token on a 128k-token tokenizer and roughly 2 milliseconds of startup overhead, with 10 to 1000 times speedups over earlier libraries.^[2]^[3]

JSONSchemaBench

To evaluate llguidance and its peers, the team published JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language Models, by Saibo Geng, Hudson Cooper, Michał Moskal, Samuel Jenkins, Julian Berman, Nathan Ranchin, Robert West, Eric Horvitz, and Harsha Nori. The paper was first posted to arXiv on January 18, 2025 (with a final v3 dated February 27, 2025). It compares Guidance, Outlines, llamacpp, XGrammar, OpenAI structured outputs, and Gemini on 10,000 real-world JSON schemas, evaluating efficiency, coverage of constraint types, and output quality.^[15]

Integration footprint

By 2025, llguidance had been integrated into several inference stacks beyond Guidance itself. OpenAI's structured outputs feature uses it as of May 2025, and llama.cpp, vLLM, SGLang, Chromium, mistral.rs, and Microsoft's onnxruntime-genai all expose it as a grammar backend.^[3]^[2] In Guidance itself, the upgrade in 0.2.0 was transparent at the API level but produced large performance gains on grammar-heavy programs.^[7]

Supported backends

Guidance is intentionally backend-agnostic. The same Python program can target a local model running through Hugging Face Transformers or llama.cpp, a model served through Microsoft's onnxruntime-genai, an Azure AI deployment, OpenAI's API, or a Google DeepMind Gemini endpoint via litellm.^[4]^[5] In practice the level of constraint support depends on the backend:

Backend	Logits access	Full grammar enforcement	Notes
Hugging Face Transformers	Yes	Yes	Best-supported local path; used for vision language model integration via `TransformersPhi3VisionEngine` and successors.^[13]
llama.cpp	Yes	Yes	Native llguidance integration; popular for quantized local inference.^[3]^[4]
ONNX Runtime GenAI	Yes	Yes	Added in 0.3.x for optimized local inference on a broader range of hardware.^[5]
Azure AI	Partial	Partial	Constraints honored when the deployment exposes compatible structured-output options.^[4]
OpenAI	API-level structured outputs	Partial	OpenAI's structured outputs feature itself depends on llguidance, but logits-level enforcement is not directly exposed via the public API.^[3]
Anthropic / Gemini (via litellm)	No	No (best effort)	Used for tool-calling style programs; constraints fall back to retry on parse failure.^[5]
SGLang	Yes	Yes	Experimental backend added in 0.3.0; SGLang itself integrates llguidance internally.^[5]

The recommended path for users who need strict guarantees is one of the local backends, where llguidance masks the logits directly. Remote APIs are useful when the developer is willing to accept partial enforcement or to combine Guidance with the provider's native structured-output mode.

Applications

Guidance is commonly used for problems where prompting alone is unreliable or expensive:

Structured extraction: parsing free-form text into JSON records that satisfy a schema, for example pulling addresses, prices, or medical codes out of source documents.^[9]
Code and DSL generation: emitting Python code, SQL, or YAML that is guaranteed to be syntactically valid because the grammar constrains decoding.^[9]^[2]
Reasoning with constraints: combining in-context learning examples with a constrained final answer (a select over a closed answer set), so the model can produce arbitrary chain-of-thought rationale but cannot hallucinate the final classification.^[14]
Tool calling: declaring a tool's signature as a JSON schema and forcing the model's tool call to match, with the result that downstream code can dispatch directly on the parsed arguments without defensive parsing.^[5]
Multi-step agent workflows: composing several gen() and select() calls inside a Python control flow so that, for instance, a planning step decides the next tool use from a closed set, an argument-generation step emits a strict JSON payload, and the result is fed back into another step, all within one process and with end-to-end constraint guarantees.^[9]^[6]
Multimodal pipelines: feeding images, audio, or video blobs to a vision language model such as Phi-3 Vision via the append_image and append_audio_bytes methods and then constraining the model's output to a structured format.^[13]

Comparison with other libraries

Guidance sits in a small but active ecosystem of "LLM programming" or "structured generation" libraries. The libraries differ in what they optimize for: some focus on strict constrained decoding at the logits level, others on schema-driven validation, others on declarative prompt optimization.

Library	First public release	Constraint mechanism	Optimization focus	Local backend support
Guidance	2022 (Microsoft / `guidance-ai`)	Token masks from regex, CFG, JSON schema via `llguidance`^[1]^[2]	Interleave Python control with token-level constraints	llama.cpp, Transformers, ONNX Runtime GenAI, SGLang^[5]
LMQL	2022 (ETH Zurich)	Logit masks via SQL-like LMP language	Declarative query language for LLMs	Limited; reported issues with batching and parallelism^[16]
Outlines	2023 (.txt / Normal Computing)	Finite-state machine compiled from regex / Pydantic / CFG	Pure-Python constrained decoding	Transformers, llama.cpp, vLLM
Instructor	2023 (Jason Liu)	Schema-driven retries via Pydantic; uses provider structured-output endpoints	Pydantic-first ergonomics on top of existing APIs	None directly; depends on backend's structured-output mode^[16]
DSPy	2023 (Stanford NLP)	Prompt optimization (signatures, modules, optimizers) rather than logits constraints	Compiling prompts and few-shots automatically	Backend-agnostic

In practice the choice depends on where the developer wants the constraint to live. Guidance and LMQL push the constraint down to the decoding step, so the constraint is provably satisfied at every token and the backend can fast-forward through structurally implied tokens. Outlines takes a similar approach but exposes a more Pythonic, schema-first API. Instructor takes the opposite approach, layering Pydantic validation and retries on top of an arbitrary provider. DSPy is at a higher level still: rather than constrain tokens, it optimizes the prompts and few-shot demonstrations that surround the program.^[16] Guidance and DSPy are sometimes used together, with DSPy authoring the prompts and Guidance enforcing the structure of the resulting completions.

Limitations

Guidance's design has several honest weaknesses that the maintainers and outside reviewers acknowledge.

Backend coupling. Full constraint enforcement requires direct access to the model's logits. APIs that do not expose this (most commercial chat APIs) can only be partially constrained, typically by leaning on the provider's own structured-output mode or by retrying on parse failure. Programs written for a local backend cannot always be moved to a remote API without losing guarantees.^[4]^[5]
Batching and parallelism. Independent reviewers have noted that Guidance historically lacked first-class batching and parallelism, while LMQL had its own throughput issues. Improvements landed in the 0.2.x and 0.3.x series, particularly running parser advancement concurrently with the model's forward pass, but Guidance is not a serving system; for high-throughput deployments developers typically combine llguidance with vLLM or SGLang directly.^[16]^[7]^[5]
Constraint mismatch. A grammar that the user expects to be tight may, in practice, permit semantically wrong completions because the LLM still chooses among legal tokens by likelihood. JSONSchemaBench and follow-up work explicitly call out the difference between satisfying the syntactic constraint and producing semantically correct content, and 0.3.1 added an inference-time "monitor" step partly to address this.^[15]^[5]
Multimodal limits. Token healing and grammar forcing cannot cross the boundary between text tokens and multimodal placeholders, because the multimodal embeddings are not in token space. The first multimodal PR (Phi 3 Vision) was eventually replaced by an alternative implementation rather than merged, reflecting the difficulty of integrating non-text blobs into a grammar-driven decoding loop.^[13]
API churn. The 2023 rewrite was a hard break with the original Handlebars-style DSL, which means that older tutorials, blog posts, and Stack Overflow answers can reference an API that no longer exists.^[6]^[10]

Significance

Guidance was one of the earliest libraries to popularize the idea that an LLM application is a program rather than a prompt, and that the language model's output should be controlled at the token level rather than coaxed via natural-language instructions. By providing both a Pythonic surface (the immutable model object) and a Rust-based grammar engine (llguidance) it occupies an unusual position: the same code that is convenient enough for notebook exploration also produces grammar-level guarantees suitable for production. The library's grammar engine has been adopted by major inference stacks including OpenAI's own structured outputs, llama.cpp, vLLM, SGLang, and onnxruntime-genai, so a substantial fraction of all "structured output" calls made through commercial and open-source LLM stacks now run through llguidance even when the user is not aware of it.^[3]^[2]

Beyond engineering impact, Guidance influenced the broader conversation about how to do reliable prompting. The motivations articulated in the project (predictable structure, lower latency through token fast-forwarding, deterministic choices via select, no expensive retries or fine-tuning) reappear in adjacent ecosystems like DSPy and in commercial structured-output features at major model providers.^[9]^[16]

DSPy: a declarative framework from Stanford NLP for programming and optimizing LLM pipelines; complementary to Guidance in that it focuses on prompt optimization rather than logits constraints.
SGLang: an efficient runtime for structured language model programs; integrates llguidance natively.
vLLM: a high-throughput LLM serving system that can use llguidance for guided decoding.
llama.cpp: a CPU/GPU C++ inference library that integrates llguidance as a grammar backend.
Function calling and tool use: the ergonomic problem Guidance addresses for many API users.
Structured output: the broader category of techniques (schema-driven validation, constrained decoding, retry-and-parse) of which Guidance is one prominent representative.

References

guidance-ai, "guidance: A guidance language for controlling large language models", GitHub README, 2026. https://github.com/guidance-ai/guidance. Accessed 2026-05-20. ↩
guidance-ai, "llguidance: Super-fast Structured Outputs", GitHub README, 2025. https://github.com/guidance-ai/llguidance. Accessed 2026-05-20. ↩
Michał Moskal, Harsha Nori, Hudson Cooper, Loc Huynh, "LLGuidance: Making Structured Outputs Go Brrr", guidance-ai.github.io technical write-up, 2025-06-11. https://guidance-ai.github.io/llguidance/llg-go-brrr. Accessed 2026-05-20. ↩
Guidance Maintainers, "guidance", PyPI package page, last updated 2026. https://pypi.org/project/guidance/. Accessed 2026-05-20. ↩
guidance-ai, "Releases - guidance-ai/guidance", GitHub releases page, 2026. https://github.com/guidance-ai/guidance/releases. Accessed 2026-05-20. ↩
Scott Lundberg, "Guidance reborn", guidance-ai/guidance Discussion #429, 2023-11-14. https://github.com/guidance-ai/guidance/discussions/429. Accessed 2026-05-20. ↩
Harsha Nori, "0.2.0 Release!", guidance-ai/guidance Discussion #1093, 2025-01-07. https://github.com/guidance-ai/guidance/discussions/1093. Accessed 2026-05-20. ↩
Microsoft Research, "guidance project page", Microsoft Research project listing, 2025. https://www.microsoft.com/en-us/research/project/guidance-control-lm-output/. Accessed 2026-05-20. ↩
Microsoft Research, "guidance: control LM output", Microsoft Research project page, 2025. https://www.microsoft.com/en-us/research/project/guidance-control-lm-output/. Accessed 2026-05-20. ↩
Thomas Claburn, "Microsoft proposes Guidance to tame large language models", The Register, 2023-05-18. https://www.theregister.com/2023/05/18/microsoft_guidance_project/. Accessed 2026-05-20. ↩
guidance-ai, "guidance issue tracker, formerly microsoft/guidance", GitHub issue redirect, 2023. https://github.com/microsoft/guidance/issues/158. Accessed 2026-05-20. ↩
Scott Lundberg, "Scott Lundberg personal site, projects: Guidance", scottlundberg.com, last updated 2025-06-27. https://scottlundberg.com/project/guidance/. Accessed 2026-05-20. ↩
nking-1 and Hudson Cooper, "Multimodal support with Phi 3 Vision + Transformers", guidance-ai/guidance pull request #1020, 2024-09 (closed 2025-05-21). https://github.com/guidance-ai/guidance/pull/1020. Accessed 2026-05-20. ↩
guidance-ai, "Token healing", Guidance documentation, latest. https://guidance.readthedocs.io/en/latest/example_notebooks/tutorials/token_healing.html. Accessed 2026-05-20. ↩
Saibo Geng, Hudson Cooper, Michał Moskal, Samuel Jenkins, Julian Berman, Nathan Ranchin, Robert West, Eric Horvitz, Harsha Nori, "JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language Models", arXiv:2501.10868, 2025-01-18 (v3 2025-02-27). https://arxiv.org/abs/2501.10868. Accessed 2026-05-20. ↩
Andrew Docherty, "Python libraries for LLM structured outputs beyond LangChain", Medium, 2024. https://medium.com/@docherty/python-libraries-for-llm-structured-outputs-beyond-langchain-621225e48399. Accessed 2026-05-20. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

Semantic Kernel

Overview

History

Origins at Microsoft Research (2022)

The "Guidance reborn" rewrite (November 2023)

Transfer to the guidance-ai organization

llguidance and the 0.2.0 release (January 2025)

Subsequent releases and recent additions

How Guidance Works

The immutable model object

Core primitives

Token-level constraint enforcement

Token healing

Stateless functions and composition

llguidance: the grammar engine

Motivation

Algorithmic design

JSONSchemaBench

Integration footprint

Supported backends

Applications

Comparison with other libraries

Limitations

Significance

Related Work

See also

References

Improve this article

Related Articles

Semantic Kernel

Microsoft Foundry Local

GitHub Copilot

GitHub Copilot Workspace

Playwright MCP

Microsoft Agent Framework

What links here

Related Articles

Semantic Kernel

Microsoft Foundry Local

GitHub Copilot

GitHub Copilot Workspace

Playwright MCP

Microsoft Agent Framework

Transfer to the `guidance-ai` organization