Stripping AI Safety Guardrails Is Now a One-Line Command – Linkpost

Heretic strips the refusal behavior from any transformer-based language model automatically, no ML expertise needed. Decensoring Llama-3.1-8B-Instruct takes about 45 minutes with a modern graphics card.

Heretic is an open-source tool that removes safety alignment from transformer-based language models automatically, without retraining or deep technical knowledge.

li>It uses directional ablation and a parameter optimizer to suppress refusals while preserving model quality, and has already been used to publish over 1,000 uncensored model variants on Hugging Face.

Safety alignment, the thing AI companies spend enormous resources building and defending, can now be undone by anyone who can run a terminal.

ai-human

<- more recent

AI-assisted book review lifted language from a Guardian piece

previous ->

Forget prompt engineering. Context engineering is the new competency

<- more recent

previous ->

THEFUTURE-Newsletter