Backdooring LLMs: Difference between revisions

6,845 bytes added , Friday at 18:28

Created page with "'''Backdooring Large Language Models (LLMs)''' refers to the process of intentionally embedding hidden, malicious behaviors—known as Backdoors—into LLMs during their training or fine-tuning phases. These Backdoors enable the model to behave normally under typical conditions but trigger undesirable outputs, such as malicious code or deceptive responses, when specific conditions or inputs are met. This phenomenon raises significant concerns about the se..."

Alpha5

Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators

8,008

edits

Backdooring LLMs: Difference between revisions

Backdooring LLMs (view source)

Revision as of 18:28, 14 March 2025