Heretic strips the refusal behavior from any transformer-based language model automatically, no ML expertise needed. Decensoring Llama-3.1-8B-Instruct takes about 45 minutes with a modern graphics card.
- Heretic is an open-source tool that removes safety alignment from transformer-based language models automatically, without retraining or deep technical knowledge.
- Safety alignment, the thing AI companies spend enormous resources building and defending, can now be undone by anyone who can run a terminal.
li>It uses directional ablation and a parameter optimizer to suppress refusals while preserving model quality, and has already been used to publish over 1,000 uncensored model variants on Hugging Face.