# Knowledge Editing

> Source: https://aiwiki.ai/wiki/knowledge_editing
> Updated: 2026-06-27
> Categories: Large Language Models, Machine Learning, Natural Language Processing
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Knowledge editing** (also called **model editing**) is a family of techniques for updating or correcting specific factual associations stored in the weights of a trained [large language model](/wiki/large_language_model) without full retraining or broad [fine-tuning](/wiki/fine_tuning). A single edit changes one fact (for example, who leads a country, or where a landmark sits) in seconds to minutes while aiming to leave the rest of the model's behavior intact. The field is anchored by the 2022 ROME method (Rank-One Model Editing), which located factual recall in middle-layer feed-forward modules and edited it with a closed-form rank-one weight update, and by MEMIT, which scaled the same idea to roughly 10,000 simultaneous edits. [1][2]

Knowledge editing exists because modern language models absorb vast amounts of world knowledge during [pre-training](/wiki/pre_training), and some of that knowledge becomes outdated, incorrect, or undesirable over time. Rather than retrain a multi-billion-parameter model (hours to days on a GPU cluster) to fix one stale fact, knowledge editing offers a targeted, computationally cheap alternative. The field gained momentum in 2021 and 2022 with landmark papers introducing causal tracing as an [interpretability](/wiki/interpretability) tool and direct parameter-modification algorithms such as ROME and MEMIT, and it now sits at the intersection of [natural language processing](/wiki/natural_language_processing), [machine learning](/wiki/machine_learning), and [AI safety](/wiki/ai_safety). [1][2][10]

The ROME paper framed the central empirical claim this way: its results "confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing." [1]

## What is knowledge editing?

Knowledge editing operates on factual knowledge represented as triples of the form *(s, r, o)*, where *s* is the subject, *r* is the relation, and *o* is the object. For example, the triple ("The Eiffel Tower", "is located in", "Paris") encodes a specific factual association.

An edit is defined as a tuple *e = (s, r, o -> o\*)*, which specifies that the model should update its stored association from the original object *o* to a new target object *o\**. For instance, if a country's head of state changes, the edit might be ("France", "president of", "Macron -> new_president").

Formally, given a pre-trained language model *f* with parameters *theta*, the goal of knowledge editing is to learn an editing function *K : (f, E) -> f\** that produces an updated model *f\** with modified parameters *theta\** such that:

1. **Reliability**: For each edit *e* in the edit set *E*, the updated model *f\** produces the new target *o\** when prompted about the subject-relation pair *(s, r)*.
2. **Generality**: The updated model also produces *o\** for semantically equivalent rephrasings of the original prompt.
3. **Locality**: The updated model's behavior remains unchanged for all inputs unrelated to the edits in *E*.

These three properties (reliability, generalization, and locality) form the core evaluation triad for any knowledge editing method. [10]

## How does knowledge editing work?

Causal tracing is an interpretability technique introduced by Meng et al. (2022) to identify where factual associations are stored inside [transformer](/wiki/transformer) models. The method is grounded in causal mediation analysis and works by running a model multiple times under controlled interventions to isolate the causal effect of individual hidden states on the model's factual predictions. [1]

### How does causal tracing locate a fact?

The procedure involves three runs of the model on a factual prompt such as "The Eiffel Tower is located in":

1. **Clean run**: The model processes the prompt normally and produces the correct prediction ("Paris").
2. **Corrupted run**: The subject tokens ("The Eiffel Tower") are replaced with corrupted [embeddings](/wiki/embeddings) (by adding Gaussian noise), which disrupts the model's ability to recall the fact.
3. **Corrupted-with-restoration run**: Starting from the corrupted input, individual hidden states at specific layers and token positions are restored to their clean values. If restoring a particular state recovers the correct prediction, that state is identified as a causal mediator of the factual recall.

### What does causal tracing reveal?

Causal traces reveal a consistent pattern across autoregressive transformer models: [1]

- **Early-to-middle MLP modules** at the position of the last subject token show the strongest causal effects. These modules act as the primary site of factual knowledge storage and retrieval.
- **[Attention](/wiki/attention) heads** at later layers serve to propagate the retrieved information from the subject token position to the final token position where the prediction is made.
- The process follows a two-step pattern: knowledge is first retrieved by MLP modules processing the subject, then attention mechanisms route that information to the output position.

These findings provided the mechanistic basis for the ROME and MEMIT editing methods, which directly target the MLP weight matrices identified by causal tracing as storing factual associations. The ROME authors describe the edited modules as "a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens." [1]

## What are the main knowledge editing methods?

Knowledge editing methods can be organized into three broad categories: **locate-then-edit** approaches that directly modify model weights at identified locations, **meta-learning** approaches that train auxiliary networks to predict weight updates, and **memory-based** approaches that store edits externally without modifying the base model's parameters.

### ROME (Rank-One Model Editing)

ROME was introduced by Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov in their 2022 paper "Locating and Editing Factual Associations in GPT," published at [NeurIPS](/wiki/neurips) 2022. It is a locate-then-edit method that treats the feed-forward (MLP) modules in transformer layers as linear key-value stores and performs a rank-one update to modify a single factual association. [1]

**Technical approach**: Each MLP module in a transformer can be viewed as implementing a linear associative memory, where input key vectors *k* (representing subjects) are mapped through a weight matrix *W* to produce value vectors *v* (encoding properties of those subjects). ROME modifies the weight matrix *W* of a specific MLP layer to insert a new key-value association *(k\*, v\*)*.

The rank-one weight update is computed as:

*W' = W + Delta*, where *Delta = (v\* - Wk\*)(C^-1 k\*)^T / (C^-1 k\*)^T k\**

Here, *C = KK^T* is the empirical covariance matrix of key vectors across many inputs, *k\** is the key vector for the target subject, *v\** is the desired new value vector (optimized so the model produces the target output), and *Wk\** is the current value that needs to be replaced.

ROME performs edits one at a time on a single MLP layer, typically targeting a middle layer identified by causal tracing (for example, a mid-stack layer such as layer 17 in GPT-2 XL, which has 48 layers, or an early-middle layer in [GPT-J](/wiki/gpt_j), which has 28 layers). It achieves high efficacy and generalization for individual edits but was not designed for batch editing of many facts simultaneously. The public reference implementation supports GPT-2 XL (1.5B) and GPT-J (6B). [1]

### MEMIT (Mass-Editing Memory in a Transformer)

MEMIT was introduced by Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau in their paper "Mass-Editing Memory in a Transformer," published at ICLR 2023. MEMIT extends ROME to handle thousands of simultaneous edits by distributing the updates across multiple MLP layers rather than concentrating them in a single layer. [2]

**Technical approach**: MEMIT spreads the desired memory updates across a range of critical MLP layers identified through causal tracing (for GPT-J, this range is layers 3 to 8). For each layer in the selected range, MEMIT computes a portion of the total desired value change and applies a least-squares update to the layer's weight matrix. By distributing the edits across layers, MEMIT avoids overloading any single layer's capacity. [2]

The authors demonstrated that MEMIT can successfully edit thousands of facts at once (up to roughly 10,000 simultaneously) in GPT-J (6B parameters) and GPT-NeoX (20B parameters), exceeding the capacity of prior methods by orders of magnitude. Performance remained stable even at large batch sizes, with only modest degradation in edit accuracy as the number of simultaneous edits increased. [2]

### KnowledgeEditor

KnowledgeEditor was proposed by Nicola De Cao, Wilker Aziz, and Ivan Titov in their 2021 paper "Editing Factual Knowledge in Language Models," published at EMNLP 2021. It is one of the earliest dedicated knowledge editing methods and takes a meta-learning-inspired approach. [3]

**Technical approach**: KnowledgeEditor trains a hyper-network that learns to predict weight updates for the base model. Given a specific edit request (an input-output pair specifying the desired factual change), the hyper-network generates a parameter update that modifies the base model's behavior for that fact. The training process uses constrained optimization to ensure that the predicted updates are localized, meaning they change the target fact without disrupting unrelated knowledge.

The method was evaluated on two architectures and tasks: a [BERT](/wiki/bert) model fine-tuned for fact-checking (the FEVER dataset) and a [BART](/wiki/bart) model for question answering (the zsRE dataset). Analysis of the learned updates revealed that they tend to be concentrated on a small subset of model components, providing evidence that factual knowledge is not uniformly distributed across all parameters.

KnowledgeEditor does not require modifications to the pre-training procedure and can be applied to any pre-trained model. However, it requires training the hyper-network beforehand, which adds an upfront cost.

### MEND (Model Editor Networks using Gradient Decomposition)

MEND was introduced by Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning in their paper "Fast Model Editing at Scale," published at ICLR 2022. Like KnowledgeEditor, MEND takes a meta-learning approach but introduces a more scalable parameterization. [4]

**Technical approach**: MEND trains small auxiliary editor networks that learn to transform the standard fine-tuning gradient for an edit into a more targeted parameter update. The key innovation is a low-rank decomposition of the gradient, which makes the transformation tractable even for very large models. The editor networks are parameterized as MLPs with a single hidden layer and use far fewer parameters than the models they edit.

MEND can be trained on a single GPU in less than a day, even for models with over 10 billion parameters. Once trained, applying a new edit requires only a single forward and backward pass through the base model (to compute the gradient) followed by a forward pass through the editor network (to transform the gradient into the final update). This makes edit application extremely fast at inference time.

At the time of publication, the authors reported that MEND was the only editing approach to produce effective edits for models ranging from tens of millions to over 10 billion parameters (tested on T5, GPT, BERT, and BART), making it a significant advance in the scalability of knowledge editing. [4]

### SERAC (Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model)

SERAC was introduced by Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, and Chelsea Finn in their paper "Memory-Based Model Editing at Scale," published at ICML 2022. Unlike the methods above, SERAC does not modify the base model's parameters at all. Instead, it stores edits in an external memory and uses auxiliary models to route inputs appropriately. [5]

**Technical approach**: SERAC consists of three components:

| Component | Function |
|---|---|
| **Base model** | The original frozen [language model](/wiki/large_language_model), left completely unchanged |
| **Scope classifier** | A trained classifier that determines whether an input is related to any stored edit |
| **Counterfactual model** | A smaller model trained to produce the correct output for edited facts, conditioned on retrieved edit examples |

When a new input arrives, the scope classifier checks whether it falls within the scope of any stored edit. If not, the input is passed directly to the frozen base model. If the input is related to a stored edit, the relevant edit is retrieved from memory and passed to the counterfactual model, which generates the updated response. The authors define the scope of an edit as "the set of inputs whose true label is affected by the edit." [5]

SERAC was evaluated on three tasks: question answering (zsRE), fact-checking (FEVER), and dialogue generation (using a custom dataset). The authors found that SERAC achieved high performance across all three tasks, consistently outperforming parameter-modifying approaches like MEND by a significant margin. [5]

Because SERAC never modifies the base model, it avoids the risk of catastrophic forgetting or unintended side effects on unrelated knowledge. However, it introduces additional inference-time overhead from the scope classifier and counterfactual model, and its performance depends on the quality of the scope classifier's decisions.

## How do the knowledge editing methods compare?

The following table summarizes the key differences between the major knowledge editing methods:

| Method | Year | Venue | Category | Modifies Weights | Batch Editing | Model Scale Tested | Key Mechanism |
|---|---|---|---|---|---|---|---|
| KnowledgeEditor | 2021 | EMNLP | Meta-learning | Yes | No | BERT, BART | Hyper-network predicts weight updates via constrained optimization |
| MEND | 2022 | ICLR | Meta-learning | Yes | No | Up to 10B+ | Low-rank gradient decomposition with learned editor networks |
| SERAC | 2022 | ICML | Memory-based | No | Yes | GPT-2, T5 | External memory with scope classifier and counterfactual model |
| ROME | 2022 | NeurIPS | Locate-then-edit | Yes | No | GPT-J (6B), GPT-2 XL (1.5B) | Rank-one MLP weight update at causally identified layer |
| MEMIT | 2023 | ICLR | Locate-then-edit | Yes | Yes (~10,000) | GPT-J (6B), GPT-NeoX (20B) | Distributed rank-one updates across multiple MLP layers |

## How is knowledge editing evaluated?

Evaluation of knowledge editing methods centers on measuring three core properties, often supplemented by additional metrics for fluency, consistency, and portability.

### Core Metrics

| Metric | Also Known As | What It Measures |
|---|---|---|
| **Efficacy** (Efficacy Success, ES) | Reliability | Whether the edited model produces the new target answer when given the exact edit prompt |
| **Generalization** (Paraphrase Success, PS) | Generality | Whether the edited model produces the new target answer when given semantically equivalent rephrasings of the edit prompt |
| **Locality** (Neighborhood Success, NS) | Specificity | Whether the edited model's predictions remain unchanged for inputs unrelated to the edit |

### Additional Metrics

Beyond the core three, researchers have introduced several supplementary evaluation dimensions:

- **Fluency** (Generation Entropy): Measures whether the edited model still generates coherent, natural text. A drop in fluency after editing would indicate damage to the model's language capabilities.
- **Consistency** (Reference Score): Evaluates whether the model maintains the edited fact consistently across extended generation, not just in the immediate completion.
- **Portability**: Assesses whether edited knowledge transfers to related downstream tasks and reasoning chains. For example, if the model is edited to know that "The president of France is X," portability tests whether the model can also correctly answer "Who leads France?" in a question-answering context.
- **Compositionality / Multi-hop [Reasoning](/wiki/reasoning)**: Tests whether edited facts correctly propagate through chains of reasoning. If fact A is edited, and fact B depends on A, does the model correctly update its answer for B?

### What is the CounterFact dataset?

CounterFact is the primary benchmark dataset for evaluating knowledge editing methods. It was introduced alongside ROME by Meng et al. (2022) and contains 21,919 counterfactual editing records covering 20,391 distinct subjects and 749 objects. The full release also provides 42,876 paraphrase prompts, 82,650 neighborhood prompts, and 62,346 generation prompts, all derived from WikiData entities via the ParaRel resource. [1]

Each record in CounterFact includes:

| Field | Description |
|---|---|
| Subject | The entity being discussed (e.g., "The Eiffel Tower") |
| Relation | The factual relationship (e.g., "is located in") |
| True target | The factually correct object (e.g., "Paris") |
| Counterfactual target | The new, counterfactual object to be inserted (e.g., "Rome") |
| Paraphrase prompts | Multiple rephrasings of the same factual query, drawn from the ParaRel resource, for testing generalization |
| Neighborhood prompts | Prompts about related but distinct facts for testing locality |

CounterFact deliberately uses counterfactual edits (inserting false information) rather than corrections to real-world errors. This design choice ensures that the post-edit target is genuinely new information that the model could not have memorized during pre-training, providing a clean test of whether the editing method actually modified the model's stored associations. [1]

### What other benchmarks are used?

Several additional benchmarks have been developed to address limitations of CounterFact:

- **zsRE** (Zero-shot Relation Extraction): A question-answering dataset used to evaluate edits in a QA format rather than cloze-style completion.
- **MQuAKE** (Multi-hop Question Answering for Knowledge Editing): Introduced at EMNLP 2023 by Zhong et al., this benchmark tests whether edited facts correctly propagate through multi-hop reasoning chains. It includes MQuAKE-CF (counterfactual edits) and MQuAKE-T (temporal updates), with questions spanning 2, 3, and 4 hops. [7]
- **RippleEdits**: Introduced by Cohen et al. (2024, TACL), this benchmark evaluates the ripple effects of knowledge edits on logically related facts. [8]
- **KnowEdit**: A unified benchmark provided by the EasyEdit framework (Yao et al., ACL 2024) that re-organizes and cleans existing datasets (including WikiBio, zsRE, WikiData Counterfact, WikiData Recent, ConvSent, and Sanitation) into standardized train/validation/test splits. [6]

## How does knowledge editing compare with fine-tuning and RAG?

Knowledge editing is one of several strategies for updating the knowledge stored in or accessed by a language model. The two most common alternatives are [fine-tuning](/wiki/fine_tuning) (including continued pre-training) and [retrieval-augmented generation](/wiki/retrieval_augmented_generation) (RAG). Each approach involves different tradeoffs.

| Dimension | Knowledge Editing | Fine-Tuning | RAG |
|---|---|---|---|
| **Where knowledge is modified** | Specific model parameters (weights) | Model parameters (weights) broadly | External knowledge base (no model changes) |
| **Computational cost per update** | Very low (seconds to minutes) | High (hours to days for large models) | Low (update documents in index) |
| **Number of facts updated** | One to thousands (method-dependent) | Potentially many, but requires curated training data | Unlimited (depends on retrieval corpus size) |
| **Risk of catastrophic forgetting** | Low if editing is localized; rises with sequential edits | High, especially with small datasets | None (base model is unchanged) |
| **Generalization of updates** | Moderate (paraphrase robustness varies by method) | Strong if training data is diverse | Strong (retrieval works across query phrasings) |
| **Inference latency** | No overhead (edits are in weights) | No overhead (edits are in weights) | Higher (requires retrieval step before generation) |
| **Infrastructure requirements** | Minimal | GPU cluster for training | Vector database and retrieval pipeline |
| **Permanence of updates** | Permanent (weights are changed) | Permanent (weights are changed) | Dependent on external system availability |
| **Multi-hop reasoning support** | Weak (current methods struggle with ripple effects) | Moderate | Moderate (depends on retrieval quality) |
| **Scalability to many updates** | Limited for weight-editing methods; better for memory-based | Requires retraining | Highly scalable |

### When should you use each approach?

**Knowledge editing** is best suited for making a small number of precise factual corrections where the update must be embedded directly in the model's weights and inference latency cannot increase. Typical use cases include correcting a specific outdated fact, removing a particular piece of sensitive information, or testing mechanistic hypotheses about knowledge storage.

**Fine-tuning** is more appropriate when the model needs to acquire a large body of new domain knowledge or when behavioral changes go beyond simple factual updates (e.g., adapting the model's style, teaching it a new task, or aligning it with updated guidelines).

**RAG** is preferred when knowledge changes frequently, the corpus of knowledge is large, and the infrastructure for maintaining a retrieval index is available. RAG is also more suitable when auditability is important, since the retrieved documents provide a clear provenance trail for the model's answers.

## What are the limitations of knowledge editing?

Despite significant progress, knowledge editing faces several open challenges that limit its practical deployment.

### What is the ripple effect?

Editing a single fact can have cascading implications for related knowledge. For example, changing the birthplace of a person should also update answers to questions about what country they are from, what language they likely speak, and other logically connected facts. Current editing methods largely fail to propagate edits through such reasoning chains. On the MQuAKE-CF multi-hop benchmark, the best-performing existing editing method reached only 33.8% accuracy on Vicuna-7B before the chain-of-thought method RippleCOT (Zhao et al., 2024) was introduced to improve it, highlighting a substantial gap. [13]

Cohen et al. (2024) systematically studied these ripple effects and proposed the RippleEdits benchmark with six categories of related facts that should change following an edit, including logical consequences, compositional reasoning, and subject aliasing. [8]

### What is sequential-editing degradation?

Applying many edits sequentially (one after another over time) can cause progressive degradation of model performance. Research has shown that parameter-modifying methods suffer from both gradual forgetting (slow erosion of unrelated knowledge) and catastrophic forgetting (sudden performance collapse) after a sufficient number of sequential edits. [Perplexity](/wiki/perplexity) tends to increase after consecutive edits across all parameter-modifying methods, serving as an indicator of model collapse.

Huang et al. (2024) documented that ROME in particular is susceptible to model collapse under sequential editing, and proposed methods to mitigate this issue. [12]

### What are knowledge conflict and distortion?

When multiple edits interact or contradict each other, they can create knowledge conflicts within the model. Li et al. (2024, ICLR) identified two failure modes: knowledge conflict, where two edits produce contradictory information that confuses the model, and knowledge distortion, where mass edits cause potentially irreversible damage to the model's internal knowledge structure. [9]

### Are edited models robust to prompt variation?

Edited models are often not robust to certain rephrasings of prompts. While standard paraphrase tests may pass, more challenging prompt variations, such as very long or noisy prompts, prompts that express doubt about the edited fact, or prompts in different languages, can cause the model to revert to its pre-edit behavior. This suggests that some editing methods achieve only superficial changes rather than deep modifications to the model's knowledge representations.

### How well does knowledge editing scale?

Locate-then-edit methods like ROME are limited to single edits, while MEMIT can handle batches of thousands but still operates within a fixed capacity. [Meta-learning](/wiki/meta-learning) methods like MEND require upfront training of the editor network. Memory-based methods like SERAC scale more naturally but introduce inference-time overhead. No current method fully solves the problem of continuously updating a model with an unbounded stream of new facts over its deployment lifetime.

### What are the gaps in evaluation?

Existing evaluation benchmarks focus primarily on simple, single-hop factual associations expressed as subject-relation-object triples. Real-world knowledge updates are often more complex, involving nuanced contextual knowledge, temporal reasoning, or knowledge that spans multiple related facts. The field lacks comprehensive benchmarks that capture the full complexity of knowledge maintenance in deployed systems.

## How does knowledge editing relate to machine unlearning?

[Machine unlearning](/wiki/machine_unlearning) is a closely related field that focuses on removing specific information from a trained model, making it behave as if it had never seen certain training data. While knowledge editing typically involves replacing one fact with another, machine unlearning aims to delete information entirely.

The two fields share significant methodological overlap. Techniques developed for knowledge editing, such as causal tracing for locating stored information and targeted weight modifications for changing model behavior, have been directly applied to unlearning tasks. Conversely, unlearning research has contributed insights about how to verify that information has truly been removed rather than merely suppressed.

### Why does machine unlearning matter?

Several practical and regulatory pressures drive machine unlearning research:

- **Privacy regulations**: The European Union's General Data Protection Regulation (GDPR), specifically Article 17 (the "right to erasure"), grants individuals the right to request deletion of their personal data. When personal data has been absorbed into an LLM's parameters during training, machine unlearning offers a potential path to compliance without full retraining.
- **Copyright concerns**: Removing copyrighted material that was inadvertently included in training data.
- **Safety**: Erasing dangerous knowledge, such as instructions for creating harmful substances, from model weights.
- **Stale knowledge removal**: Rather than simply overwriting outdated facts (as in standard knowledge editing), unlearning can ensure the old information is no longer accessible through any prompt formulation.

### How do the methods connect?

Several approaches bridge knowledge editing and machine unlearning:

- **Gradient ascent**: Performing gradient ascent on the data to be forgotten, effectively increasing the model's loss on that data. This is the inverse of standard training and is conceptually simple, though it can damage unrelated knowledge.
- **SISA Training** (Sharded, Isolated, Sliced, and Aggregated): Partitions training data into shards with separate sub-models, allowing targeted retraining of only the affected shard when an unlearning request arrives.
- **Mechanistic unlearning**: Uses causal tracing and similar interpretability tools to locate the specific parameters encoding the information to be removed, then applies targeted modifications. This approach directly builds on the locate-then-edit paradigm from knowledge editing.
- **Knowledge editing as unlearning baseline**: Recent work (2025) has explored using standard knowledge editing methods like ROME and MEMIT as baselines for unlearning benchmarks, finding that they can sometimes serve as effective unlearning tools when the goal is to remove specific factual associations.

## What software is used for knowledge editing?

### EasyEdit

EasyEdit is an open-source framework developed by Zhejiang University (Yao et al., ACL 2024) that provides unified implementations of major knowledge editing methods including ROME, MEMIT, MEND, SERAC, KnowledgeEditor, and several newer techniques. The framework supports multiple [LLM](/wiki/large_language_model) architectures and provides standardized evaluation on CounterFact, zsRE, and other benchmarks via its bundled KnowEdit benchmark. EasyEdit has become the de facto standard tool for knowledge editing research and experimentation. [6]

## See also

- [Large language model](/wiki/large_language_model)
- [Fine-tuning](/wiki/fine_tuning)
- [Retrieval-augmented generation](/wiki/retrieval_augmented_generation)
- [Machine unlearning](/wiki/machine_unlearning)
- [Interpretability](/wiki/interpretability)
- [Hallucination](/wiki/hallucination)

## References

1. Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). "Locating and Editing Factual Associations in GPT." *Advances in Neural Information Processing Systems 35* (NeurIPS 2022). arXiv:2202.05262.
2. Meng, K., Sharma, A. S., Andonian, A., Belinkov, Y., & Bau, D. (2023). "Mass-Editing Memory in a Transformer." *International Conference on Learning Representations* (ICLR 2023). arXiv:2210.07229.
3. De Cao, N., Aziz, W., & Titov, I. (2021). "Editing Factual Knowledge in Language Models." *Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing* (EMNLP 2021).
4. Mitchell, E., Lin, C., Bosselut, A., Finn, C., & Manning, C. D. (2022). "Fast Model Editing at Scale." *International Conference on Learning Representations* (ICLR 2022). arXiv:2110.11309.
5. Mitchell, E., Lin, C., Bosselut, A., Manning, C. D., & Finn, C. (2022). "Memory-Based Model Editing at Scale." *International Conference on Machine Learning* (ICML 2022). arXiv:2206.06520.
6. Yao, Y., et al. (2024). "EasyEdit: An Easy-to-use Knowledge Editing Framework for LLMs." *Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics* (ACL 2024). arXiv:2308.07269.
7. Zhong, Z., Wu, Z., Manning, C. D., Potts, C., & Chen, D. (2023). "MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions." *Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing* (EMNLP 2023).
8. Cohen, R., Biran, E., Yoran, O., Globerson, A., & Geva, M. (2024). "Evaluating the Ripple Effects of Knowledge Editing in Language Models." *Transactions of the Association for Computational Linguistics*, 12.
9. Li, Z., et al. (2024). "Unveiling the Pitfalls of Knowledge Editing for Large Language Models." *International Conference on Learning Representations* (ICLR 2024).
10. Wang, Y., et al. (2024). "Knowledge Editing for Large Language Models: A Survey." *ACM Computing Surveys*.
11. Mazzia, V., et al. (2024). "A Survey on Knowledge Editing of Neural Networks." *arXiv preprint*.
12. Huang, X., et al. (2024). "Rebuilding ROME: Resolving Model Collapse during Sequential Model Editing." *Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing* (EMNLP 2024).
13. Zhao, J., et al. (2024). "RippleCOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning." *Findings of EMNLP 2024*. arXiv:2410.03122.