Backdooring LLMs: Difference between revisions

Backdooring LLMs (view source)

85 bytes added , Friday at 18:42

no edit summary

7,994

edits

@@ Line 1: / Line 1: @@
-{{Backdooring LLMs}}
+{{see also|artificial intelligence terms}}
 '''Backdooring [[Large Language Models]] (LLMs)''' refers to the process of intentionally embedding hidden, malicious behaviors—known as [[Backdoors]]—into [[LLMs]] during their training or fine-tuning phases. These [[Backdoors]] enable the model to behave normally under typical conditions but trigger undesirable outputs, such as malicious code or deceptive responses, when specific conditions or inputs are met. This phenomenon raises significant concerns about the security and trustworthiness of [[LLMs]], especially as they are deployed in critical applications like [[Code Generation]], fraud detection, and decision-making systems.
@@ Line 81: / Line 81: @@
 <ref name="”7”">Ken Thompson, "Reflections on Trusting Trust," Communications of the ACM, August 1984.</ref>
 </references>
+[[Category:Terms]] [[Category:Artificial intelligence terms]]