Jump to content

Backdooring LLMs: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1: Line 1:
{{Backdooring LLMs}}
{{see also|artificial intelligence terms}}
'''Backdooring [[Large Language Models]] (LLMs)''' refers to the process of intentionally embedding hidden, malicious behaviors—known as [[Backdoors]]—into [[LLMs]] during their training or fine-tuning phases. These [[Backdoors]] enable the model to behave normally under typical conditions but trigger undesirable outputs, such as malicious code or deceptive responses, when specific conditions or inputs are met. This phenomenon raises significant concerns about the security and trustworthiness of [[LLMs]], especially as they are deployed in critical applications like [[Code Generation]], fraud detection, and decision-making systems.
'''Backdooring [[Large Language Models]] (LLMs)''' refers to the process of intentionally embedding hidden, malicious behaviors—known as [[Backdoors]]—into [[LLMs]] during their training or fine-tuning phases. These [[Backdoors]] enable the model to behave normally under typical conditions but trigger undesirable outputs, such as malicious code or deceptive responses, when specific conditions or inputs are met. This phenomenon raises significant concerns about the security and trustworthiness of [[LLMs]], especially as they are deployed in critical applications like [[Code Generation]], fraud detection, and decision-making systems.


Line 81: Line 81:
<ref name="”7”">Ken Thompson, "Reflections on Trusting Trust," Communications of the ACM, August 1984.</ref>
<ref name="”7”">Ken Thompson, "Reflections on Trusting Trust," Communications of the ACM, August 1984.</ref>
</references>
</references>
[[Category:Terms]] [[Category:Artificial intelligence terms]]