Backdooring LLMs: Difference between revisions

Backdooring LLMs (view source)

No change in size , Friday at 18:45

no edit summary

8,008

edits

@@ Line 1: / Line 1: @@
 {{see also|artificial intelligence terms}}
-'''Backdooring [[Large Language Models]] (LLMs)''' refers to the process of intentionally embedding hidden, malicious behaviors—known as [[Backdoors]]—into [[LLMs]] during their training or fine-tuning phases. These [[Backdoors]] enable the model to behave normally under typical conditions but trigger undesirable outputs, such as malicious code or deceptive responses, when specific conditions or inputs are met. This phenomenon raises significant concerns about the security and trustworthiness of [[LLMs]], especially as they are deployed in critical applications like [[Code Generation]], fraud detection, and decision-making systems.
+'''Backdooring [[large language models]] (LLMs)''' refers to the process of intentionally embedding hidden, malicious behaviors—known as [[Backdoors]]—into [[LLMs]] during their training or fine-tuning phases. These [[Backdoors]] enable the model to behave normally under typical conditions but trigger undesirable outputs, such as malicious code or deceptive responses, when specific conditions or inputs are met. This phenomenon raises significant concerns about the security and trustworthiness of [[LLMs]], especially as they are deployed in critical applications like [[Code Generation]], fraud detection, and decision-making systems.
 == Overview ==