# Graph of Thoughts

> Source: https://aiwiki.ai/wiki/graph_of_thoughts
> Updated: 2026-06-07
> Categories: Large Language Models, Prompt Engineering
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

Graph of Thoughts (GoT) is a prompting and reasoning framework that models the intermediate steps of a [large language model](/wiki/large_language_model) as an arbitrary directed graph rather than a linear chain or a tree. Each vertex is an "LLM thought" (a self-contained intermediate solution or piece of information) and each directed edge is a transformation that derives one thought from one or more others, so that aggregation, refinement, and feedback loops become first-class primitives.[^1] The framework was introduced in the paper "Graph of Thoughts: Solving Elaborate Problems with Large Language Models" by Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michał Podstawski, Hubert Niewiadomski, Piotr Nyczyk and Torsten Hoefler, posted on arXiv on 18 August 2023 (arXiv:2308.09687) and published in the Proceedings of the AAAI Conference on Artificial Intelligence at AAAI-24 (Vol. 38, No. 16, pages 17682 to 17690).[^1][^2] The work was carried out primarily at the Scalable Parallel Computing Laboratory (SPCL) at ETH Zürich, with collaborators from Cledar (Warsaw) and Warsaw University of Technology, and an official Python reference implementation is maintained at the spcl/graph-of-thoughts GitHub repository.[^3][^4]

| Field | Value |
| --- | --- |
| Name | Graph of Thoughts (GoT) |
| Type | Prompting / reasoning framework for LLMs |
| First posted | 2023-08-18 (arXiv:2308.09687) |
| Conference | AAAI-24, pages 17682-17690 (Vol. 38, No. 16) |
| DOI | 10.1609/aaai.v38i16.29720 |
| Lead authors | Maciej Besta, Nils Blach (equal contribution) |
| Senior author | Torsten Hoefler |
| Affiliations | ETH Zürich (SPCL), Cledar, Warsaw University of Technology |
| Reference implementation | spcl/graph-of-thoughts on GitHub, Python package `graph_of_thoughts` |
| Models in original evaluation | [GPT-3.5](/wiki/gpt-3.5) (primary), GPT-4 (limited), [Llama 2](/wiki/llama_2) |
| Reported headline result | ~62% lower median sorting error vs ToT; >31% cost reduction (128-element sorting) |
| Source | [^1][^2][^4][^7] |

## Background and motivation

By the time GoT was proposed, structured prompting had moved through several stages. The standard input-output (IO) prompt provided a single input and asked for a single output. [Chain-of-Thought](/wiki/chain_of_thought) (CoT) prompting and [Self-consistency](/wiki/self_consistency) (CoT-SC) extended this by asking the model to produce an explicit sequence of intermediate steps, turning the reasoning trace into a linear chain or a set of independently sampled chains.[^5] [Tree of Thoughts](/wiki/tree_of_thoughts) (ToT) generalised the chain into a tree, allowing the model to branch into multiple candidate continuations, score them, and backtrack, which improved performance on search-style tasks at the cost of additional model calls.[^1][^5]

Besta and colleagues observed that human problem solving is rarely captured by either a chain or a tree. People routinely combine partial results from independent lines of thought (aggregation), revisit and patch earlier intermediate conclusions (refinement) and form feedback loops where a later thought informs an earlier one.[^1] These patterns naturally form a directed graph, possibly with cycles, in which an intermediate state may have several parents. Existing chain and tree topologies cannot represent such "merging" or "looping" structures, because each thought in a chain or a tree has exactly one parent.[^1] GoT was designed as the smallest framework that closes this gap: it lets a thought have any number of incoming edges, so partial solutions from independent subproblems can be merged, and it lets edges form self-loops or cycles, so a thought can be iteratively improved.[^1]

The authors also tied the design to a cost argument. In a more comprehensive follow-up, "Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts" (arXiv:2401.14295), the same group situates GoT inside a taxonomy of reasoning topologies and analyses how the shape of the reasoning graph affects latency, volume of information, and token cost across chain, tree, and graph schemes.[^6]

## Formal model

The paper defines a reasoning process as a tuple (G, T, E, R).[^1] G = (V, E) is a directed graph whose vertices V are LLM thoughts, with V possibly heterogeneous (for example, in a writing task some vertices are plans and others are paragraphs).[^1] Edges (u, v) in E indicate that the content of v has been constructed using thought u as part of its prompt. T is the set of thought transformations that the framework can apply to the current graph. E is an evaluator function that assigns a numeric score to a thought, possibly using the whole graph as context. R is a ranking function that selects one or more thoughts from the current graph based on those scores.[^1]

A transformation t in T rewrites the graph: V' = (V union V_plus) minus V_minus and E' = (E union E_plus) minus E_minus, where V_plus and E_plus are newly added vertices and edges, and V_minus, E_minus are removed ones.[^7] Three graph-enabled transformations are highlighted:[^1][^7]

- Aggregation transformation: combines k existing thoughts into a single new vertex with k incoming edges. This allows partial solutions from independent branches to be merged into a stronger synthesis.
- Refinement transformation: improves an existing thought, modelled either as a self-loop or as a new vertex that inherits the same content semantics but with revised contents.
- Generation transformation: spawns k new child thoughts from a single parent, generalising the branching used by ToT and the sampling used by CoT-SC.

To analyse cost, the paper introduces the volume of a thought, defined as the number of preceding LLM thoughts from which the target thought is reachable by a directed path.[^7] For a fixed total budget of Θ(N) thoughts, the paper compares topologies as follows: CoT has latency N and volume N; CoT-SC partitions the budget into k independent chains, so latency is N/k and volume is N/k; ToT achieves logarithmic latency log_k N but only logarithmic volume log_k N; GoT, by using aggregation, can achieve logarithmic latency log_k N while still letting the final thought depend on all N preceding thoughts, giving volume N.[^7] This is presented as the key theoretical advantage of GoT: aggregation lets a low-latency reasoning graph still incorporate a large amount of upstream information.

## Operations and the Graph of Operations abstraction

In addition to the abstract transformations, the paper separates two graphs.[^1][^7] The Graph of Operations (GoO) is a static directed graph specified by the user. Its vertices are operations to execute (Generate, Aggregate, Score, KeepBestN and so on), and its edges describe data flow between them. The Graph Reasoning State (GRS) is the dynamic graph built up at runtime: as the controller traverses the GoO, it materialises actual thought vertices and the dependencies among them, plus their scores and validity flags.[^1][^7]

The reference implementation in spcl/graph-of-thoughts exposes a concrete set of operations that instantiate the abstract transformations.[^4] The operations module documents the following:[^8]

- Generate: produces new thoughts from current ones; when no previous thoughts exist, it is initialised with the controller's input.
- Score: collects all thoughts from preceding operations and assigns scores, either by calling the LLM as a judge or by invoking a custom Python scoring function.
- ValidateAndImprove: validates each thought against problem-specific criteria, and if invalid, asks the LLM to revise it.
- Improve: like ValidateAndImprove but skips validation and always tries to refine the thought.
- Aggregate: combines several current thoughts into a single new thought.
- KeepBestN: retains the top-N thoughts according to their scores; raises an error if the thoughts are not yet scored.
- KeepValid: keeps thoughts that have passed validation; thoughts without validation are treated as valid.
- Selector: chooses a subset of thoughts using a user-provided selection function.
- GroundTruth: checks whether the current thoughts solve the problem by comparing against a known ground truth.

These operations communicate with the model through two helper classes, the Prompter, which formats the current GRS into a prompt for the LLM, and the Parser, which extracts structured information from the LLM reply.[^4][^8] A separate Controller orchestrates execution: it walks the GoO, invokes operations in dependency order, updates the GRS, dispatches calls to the configured language model, and writes the resulting GRS to a JSON file for later inspection.[^4]

## Architecture of the reference framework

The spcl/graph-of-thoughts package is implemented in Python and can be installed from PyPI with `pip install graph_of_thoughts`, or from source in editable mode.[^4] Its main modules mirror the abstractions of the paper:[^4]

- `language_models`: thin wrappers around chat-completion endpoints (for example, the OpenAI API for GPT-3.5 and GPT-4) plus configuration of model, temperature, and rate limits.
- `prompter`: per-task subclasses that turn parts of the GRS into natural-language prompts.
- `parser`: per-task subclasses that decode model replies into structured thoughts and validity information.
- `operations`: the operation classes listed above, plus the `GraphOfOperations` class used to build a GoO programmatically.
- `controller`: ties everything together and runs the GoO end to end.

The repository ships example implementations for sorting, set intersection, keyword counting, and document merging, each containing prompter, parser, scoring logic, and a configurable Graph of Operations.[^4] The repository's citation block lists the AAAI 2024 publication with DOI 10.1609/aaai.v38i16.29720 as the canonical reference.[^4]

A typical end-to-end workflow with the reference framework runs in five steps: the developer subclasses `Prompter` and `Parser` for the task, defines a scoring function, builds a `GraphOfOperations` by appending operations such as `Generate`, `Aggregate`, `Score`, `KeepBestN`, `Improve`, and `GroundTruth`, instantiates a language model wrapper, and finally instantiates a `Controller` and calls its `run()` method, which serialises both the resulting GRS and a JSON log of all LLM calls so that costs and intermediate thoughts can be audited after the fact.[^4][^8] An independent walk-through of the framework, applied to a project-portfolio optimisation problem, shows the same control loop: Controller drives Prompter, LLM produces thoughts, Parser decodes them into the GRS, scoring and validation are applied, and the loop continues until the GoO is exhausted.[^14]

## Case studies and experimental results

The paper evaluates GoT against four baselines, IO prompting, CoT, CoT-SC, and ToT, on four tasks that were designed to exercise aggregation and refinement.[^1][^7] Most experiments use [GPT-3.5](/wiki/gpt-3.5) as the language model due to budget considerations, with limited additional experiments on GPT-4 and on [Llama 2](/wiki/llama_2); in the paper's Llama 2 runs, results are described as "usually worse than GPT-3.5".[^7] Each task is evaluated across 100 input samples per baseline, with temperature set to 1.0 and a 4k-token context (16k for document merging).[^7]

### Sorting

The sorting task asks the model to sort a list of numbers of length 32, 64, or 128. The scoring function checks that the output is monotonically ordered and that the multiset of numbers is preserved, in order to penalise hallucinated or dropped values.[^1][^7] The GoO for sorting decomposes the list into chunks, sorts each chunk in parallel via Generate operations, then merges adjacent sorted chunks pairwise using Aggregate operations until a single sorted list remains, with Score and KeepBestN steps in between.[^1] On the 128-element setting, GoT reduces the median number of errors by roughly 62% compared with ToT while simultaneously cutting cost by more than 31%, the headline figures reported in the abstract.[^1][^2]

### Set intersection

For set intersection, the input is two sets of 32, 64, or 128 elements, and the model has to output their intersection. GoT splits the second set into subsets, asks the LLM to intersect each subset with the first set, and then aggregates the partial intersections; the result is then validated against the original sets.[^7] GoT consistently exceeds the baselines, especially as set size grows, where IO and CoT degrade quickly because the answer must fit in a single chain.[^7]

### Keyword counting

In keyword counting, the input is a passage and the model has to output the number of occurrences of each country mentioned. The GoO splits the passage, asks the LLM to count keywords per chunk, and then aggregates the per-chunk dictionaries into a single tally.[^7] In the paper's experiments, GoT solved 25 of the test instances correctly while CoT solved 1 and ToT solved 0 in the same configuration, illustrating how aggregation of per-chunk counts beats both a single linear pass and a tree search.[^7]

### Document merging

The document merging task takes several non-disclosure agreements (NDAs) that partially overlap in content and asks the model to produce a single NDA that minimises duplication while maximising information retained.[^1][^7] Quality is evaluated using the harmonic mean of two LLM-as-judge scores: one for redundancy and one for information retention. GoT achieves a harmonic-mean score around 8.5 out of 10 on 50 evaluated samples, outperforming the chain-based and tree-based baselines, especially as the number of input documents grows.[^7]

Across all four tasks the paper reports the same overall pattern: GoT improves answer quality relative to IO, CoT, CoT-SC, and ToT, and on the merge-heavy tasks it does so while reducing total LLM cost compared with ToT, because aggregation cuts the number of redundant exploratory branches that ToT would have to score.[^1][^7]

The benchmarking methodology fixes several hyperparameters across baselines to make the comparison fair. All schemes use the same underlying chat-completion endpoint, the same temperature (1.0), the same 4k-token context window (16k tokens for document merging because of input length), the same number of input samples (100 per task and baseline, 50 for document merging), and a problem-specific scoring function that is held constant when switching between IO, CoT, CoT-SC, ToT and GoT.[^1][^7] Cost is measured in dollars based on OpenAI's billed token usage, so reductions in cost translate directly to reductions in the number of model calls and total prompt-plus-completion tokens.[^1][^4]

## Comparison with chain and tree prompting

The framework is most easily understood in contrast with its predecessors. The table summarises the key differences.

| Property | IO prompting | [Chain-of-Thought](/wiki/chain_of_thought) | [Self-consistency](/wiki/self_consistency) (CoT-SC) | [Tree of Thoughts](/wiki/tree_of_thoughts) | Graph of Thoughts |
| --- | --- | --- | --- | --- | --- |
| Topology of reasoning | single node | chain | k independent chains | tree | arbitrary directed graph, possibly with cycles |
| Aggregation of multiple parents | no | no | majority vote on final answer only | no | yes, via Aggregate operation |
| Refinement / feedback loops | no | no | no | no | yes, via Improve / ValidateAndImprove |
| Backtracking | no | no | no | yes | yes |
| Latency at Θ(N) thoughts | 1 | N | N / k | log_k N | log_k N |
| Volume of final thought | 1 | N | N / k | log_k N | N |
| Reference | [^5] | [^5] | [^5] | [^5] | [^1][^7] |

The salient point is the last two rows. ToT brought logarithmic latency by introducing branching, but each leaf in a tree can only "see" the thoughts along its root-to-leaf path, which has logarithmic depth. GoT recovers the linear volume of CoT by adding aggregation edges into the same tree, so the final thought can depend on every other thought generated during reasoning without having to be sequentially derived from them.[^1][^7]

A practitioner-oriented summary published in 2024 frames GoT as one element of a family of "graph-based prompting" methods and stresses that GoT subsumes both CoT and ToT as special cases of a directed graph (a path graph and a rooted tree, respectively), so any CoT or ToT pipeline can in principle be expressed as a Graph of Operations.[^5]

## Applications and use cases

The case studies in the original paper map onto three broad categories of LLM workloads that benefit from a graph topology. The first is divide-and-conquer over long inputs, exemplified by sorting and keyword counting, where the problem is split into independent chunks, each chunk is solved by the model, and the partial answers are then aggregated. The second is many-to-one synthesis, exemplified by document merging, where several documents must be combined into a single output while controlling redundancy and information loss. The third is exploratory search with refinement, where multiple candidate solutions are scored, refined, and then either combined or pruned, as illustrated by set intersection.[^1][^7]

Outside the original paper, practitioner guides identify additional families of tasks where GoT-style pipelines are appealing. Complex code-generation flows can benefit from generating multiple candidate functions, scoring them, and aggregating their common skeleton; multi-source question answering can benefit from merging retrieved passages into a synthesised answer; project portfolio optimisation can use Generate-Validate-Score-KeepBestN cycles to enumerate feasible portfolios under constraints and rank them by total value.[^14] In all of these cases, the recurring justification is the same: when the natural problem structure has merging or feedback edges, encoding the prompt flow as a graph is a closer match to the problem than a chain or a tree, and the controller can exploit that structure to call the model only as often as the task actually requires.[^1][^14]

## Adoption and follow-up work

GoT has been incorporated into several surveys and practitioner guides as a canonical example of structured reasoning. The Topologies of Reasoning paper by the same group treats it as a representative graph topology, alongside chains and trees, and uses it to derive a comparative analysis of prompting structures, theoretical underpinnings, and connections to knowledge bases.[^6] Independent expositions on agentic-patterns sites and AI knowledge bases describe GoT's core operations (branching, aggregation, refinement, looping) and note its overhead, with one source estimating that graph-of-thoughts pipelines can be five to twenty times more expensive than a single linear CoT call for simple problems, which is why GoT is recommended only for tasks that genuinely require interdependent reasoning.[^9][^10]

In April 2025 the same SPCL-led team published "Affordable AI Assistants with Knowledge Graph of Thoughts" (arXiv:2504.02670), which extends GoT to maintain a persistent structured knowledge graph during reasoning, sharing the same Graph of Operations machinery with the original framework. The paper is hosted on the SPCL publications page alongside the original GoT manuscript.[^11]

Maciej Besta, the lead author, is a researcher at SPCL focusing on sparse graph computations and LLMs, and has received the IEEE TCSC Award for Excellence in Scalable Computing Early Career (2023) and the IEEE TCHPC Early Career Researchers Award (2024).[^12] Co-first author Nils Blach, who completed his graduate work at SPCL under Besta and Hoefler, lists GoT as his primary publication on his personal page, and notes that the first two authors of the AAAI paper contributed equally.[^13]

The spcl/graph-of-thoughts repository has accumulated thousands of GitHub stars and is widely cited as a reference open-source implementation; the citation block in its README is the canonical pointer used by downstream projects.[^4] GoT vocabulary (aggregation, refinement, GoO, GRS) has also been adopted by derivative frameworks. The "Hierarchical Graph of Thoughts" (HGoT) paper, posted on arXiv as 2402.09390, reuses the graph topology idea and applies it to retrieval-augmented in-context learning for factuality evaluation; it explicitly cites the GoT framework as the basis for its hierarchical extension.[^15] More broadly, agentic-pattern catalogues and practitioner newsletters describe GoT alongside ReAct, Self-Refine, and Plan-and-Execute as one of the named patterns for orchestrating multi-step LLM reasoning, often packaged via libraries such as LangGraph that materialise a graph of stateful nodes at runtime.[^9][^10]

## Limitations and criticisms

Although the GoT framework is more expressive than chains and trees, this expressiveness comes at a cost.

- Manual graph design. In the original framework, the Graph of Operations is specified by the user, which means a developer has to decide how to decompose the task, when to aggregate, and how to score. The paper does not provide an automatic GoO synthesiser, so the design effort can dominate the engineering budget for one-off tasks.[^1][^7]
- Heavy reliance on scoring. Every Aggregate or KeepBestN step depends on a scoring function that is either custom code or another LLM call, and bad scores propagate through the rest of the graph.[^1][^4]
- Modest gains on "non-mergeable" tasks. Practitioner reviews observe that on tasks with little structural decomposition (for instance plain document merging in unconstrained settings) the headline quality and cost gains shrink, and that GoT is most worthwhile when the problem can be cleanly partitioned and re-merged.[^9][^10]
- Limited model coverage in the original evaluation. The bulk of the published numbers are on [GPT-3.5](/wiki/gpt-3.5); experiments with GPT-4 are limited by cost, and [Llama 2](/wiki/llama_2) runs are reported to be worse than GPT-3.5 on the same tasks, leaving open how well GoT transfers to other model families.[^7]
- Reasoning-engineering complexity. Independent commentary highlights that constructing the right operations graph, choosing the right N for KeepBestN, and balancing aggregation against generation introduces tuning knobs that are not present in plain CoT, which can make pipelines brittle.[^9][^10]

## Related work

GoT sits in the dense neighbourhood of structured-prompting techniques. [Chain-of-Thought](/wiki/chain_of_thought) established that explicit intermediate steps help, [Self-consistency](/wiki/self_consistency) showed that sampling many chains and voting raises accuracy, and [Tree of Thoughts](/wiki/tree_of_thoughts) showed that branching plus search beats single chains on search-style problems.[^5] Adjacent methods include [ReAct](/wiki/react_prompting) (interleaving reasoning and tool calls), [Self-Refine](/wiki/self_refine) (iterative self-feedback, conceptually similar to GoT's Improve operation but applied to a single chain), [Step-Back Prompting](/wiki/step_back_prompting) (abstracting before answering), [Self-Discover prompting](/wiki/self_discover) and [Auto-CoT](/wiki/auto_cot) (constructing demonstrations automatically). On the structural side, GoT shares vocabulary with [Graph Neural Networks](/wiki/graph_neural_network), [knowledge graphs](/wiki/knowledge_graph) and the [computational graph](/wiki/computational_graph) view of model execution, but it operates entirely at the prompt level: there is no learned weight update and the graph is materialised through repeated calls to a pretrained LLM, drawing on the model's [in-context learning](/wiki/in-context_learning) capability and counting against [test-time compute](/wiki/test_time_compute) budgets.

The most direct intellectual sibling is the Topologies of Reasoning paper, which arranges chain, tree, and graph schemes along a taxonomy of "reasoning topologies" and asks which structure is best for which task; GoT plays the role of the most general topology in that taxonomy.[^6] A second sibling is the hierarchical HGoT extension, which composes nested GoT graphs to handle factuality evaluation in retrieval-augmented in-context learning, reporting a roughly 7% improvement over competing methods on the FEVER fact-verification benchmark while matching strong baselines on Open-SQuAD and HotPotQA.[^15] Together with the original GoT paper and the Knowledge Graph of Thoughts follow-up, these works form a coherent line of "graph-shaped reasoning" research originating from SPCL at ETH Zürich.[^6][^11][^15]

## See also

- [Chain-of-Thought](/wiki/chain_of_thought)
- [Tree of Thoughts](/wiki/tree_of_thoughts)
- [Self-consistency](/wiki/self_consistency)
- [Self-Refine](/wiki/self_refine)
- [Self-Discover prompting](/wiki/self_discover)
- [Step-Back Prompting](/wiki/step_back_prompting)
- [Auto-CoT](/wiki/auto_cot)
- [ReAct (prompting)](/wiki/react_prompting)
- [Prompt Engineering](/wiki/prompt_engineering)
- [In-Context Learning](/wiki/in_context_learning)
- [Knowledge graph](/wiki/knowledge_graph)
- [Graph Neural Network](/wiki/graph_neural_network)
- [Reasoning (artificial intelligence)](/wiki/reasoning)
- [Test-time compute](/wiki/test_time_compute)
- [Agent planning](/wiki/agent_planning)

## References

[^1]: Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler, "Graph of Thoughts: Solving Elaborate Problems with Large Language Models", arXiv:2308.09687, 2023-08-18 (revised 2024-02-06). https://arxiv.org/abs/2308.09687. Accessed 2026-05-21.
[^2]: Maciej Besta et al., "Graph of Thoughts: Solving Elaborate Problems with Large Language Models", Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, No. 16, pages 17682-17690, AAAI Press, 2024-03-25. https://ojs.aaai.org/index.php/AAAI/article/view/29720. Accessed 2026-05-21.
[^3]: SPCL, "Scalable Parallel Computing Laboratory at ETH Zurich", ETH Zurich, n.d. https://spcl.inf.ethz.ch/. Accessed 2026-05-21.
[^4]: spcl, "graph-of-thoughts: Official Implementation of Graph of Thoughts", GitHub repository, last updated 2025. https://github.com/spcl/graph-of-thoughts. Accessed 2026-05-21.
[^5]: Cameron R. Wolfe, "Graph-Based Prompting and Reasoning with Language Models", Deep (Learning) Focus newsletter, 2024-02-12. https://cameronrwolfe.substack.com/p/graph-based-prompting-and-reasoning. Accessed 2026-05-21.
[^6]: Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwasniewski, Jurgen Muller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler, "Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts", arXiv:2401.14295, 2024-01-25. https://arxiv.org/abs/2401.14295. Accessed 2026-05-21.
[^7]: Maciej Besta et al., "Graph of Thoughts: Solving Elaborate Problems with Large Language Models" (HTML version), ar5iv (arXiv labs), 2024-02-06. https://ar5iv.labs.arxiv.org/html/2308.09687. Accessed 2026-05-21.
[^8]: spcl, "graph_of_thoughts/operations/README.md", GitHub repository spcl/graph-of-thoughts, last updated 2024. https://github.com/spcl/graph-of-thoughts/blob/main/graph_of_thoughts/operations/README.md. Accessed 2026-05-21.
[^9]: Agentic Patterns, "Graph of Thoughts (GoT)", Awesome Agentic Patterns, 2024. https://agentic-patterns.com/patterns/graph-of-thoughts/. Accessed 2026-05-21.
[^10]: AgentWiki, "Graph of Thoughts", AI Agent Knowledge Base, 2024. https://agentwiki.org/graph_of_thoughts. Accessed 2026-05-21.
[^11]: Maciej Besta et al., "Affordable AI Assistants with Knowledge Graph of Thoughts", arXiv:2504.02670 / SPCL preprint, 2025-04. https://spcl.inf.ethz.ch/Publications/.pdf/besta-kgot.pdf. Accessed 2026-05-21.
[^12]: ETH Zurich Department of Computer Science, "IEEE TCHPC Early Career Researchers Award for Maciej Besta", inf.ethz.ch news, 2024-09. https://inf.ethz.ch/news-and-events/spotlights/infk-news-channel/2024/09/ieee-tchpc-early-career-researchers-award-for-maciej-besta.html. Accessed 2026-05-21.
[^13]: Nils Blach, "Personal page", nblach.com, 2024. https://nblach.com/. Accessed 2026-05-21.
[^14]: Jacek Wo (Jomsborg Lab), "LLMs Graph of Thoughts Framework. Case study", Medium, 2024. https://medium.com/@JacekWo/llms-graph-of-thoughts-framework-c5607a46aa9a. Accessed 2026-05-21.
[^15]: Yihao Fang, Stephen W. Thomas, Xiaodan Zhu, "HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation", arXiv:2402.09390, 2024-02-14. https://arxiv.org/pdf/2402.09390. Accessed 2026-05-21.

