Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Submission history
From: Jan Betley [view email][v1] Mon, 24 Feb 2025 18:56:03 UTC (8,456 KB)
[v2] Tue, 25 Feb 2025 23:57:54 UTC (8,458 KB)
[v3] Fri, 28 Feb 2025 00:11:35 UTC (8,460 KB)
[v4] Wed, 5 Mar 2025 02:15:50 UTC (8,460 KB)
[v5] Sun, 4 May 2025 22:39:38 UTC (8,731 KB)
[v6] Mon, 12 May 2025 06:51:03 UTC (8,731 KB)
What's Your Reaction?






