Removing refusals with HF Transformers

This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens. This means, that this supports every model that HF Transformers supports*.

The code was tested on a RTX 2060 6GB, thus mostly <3B models have been tested, but the code has been tested to work with bigger models as well.

*While most models are compatible, some models are not. Mainly because of custom model implementations. Some Qwen implementations for example don't work. Because model.model.layers can't be used for getting layers. They call the variables so that, model.transformer.h must be used, if I'm not mistaken.

Usage

Set model and quantization in compute_refusal_dir.py and inference.py (Quantization can apparently be mixed)
Run compute_refusal_dir.py (Some settings in that file may be changed depending on your use-case)
Run inference.py and ask the model how to build an army of rabbits, that will overthrow your local government one day, by stealing all the carrots.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
compute_refusal_dir.py		compute_refusal_dir.py
harmful.txt		harmful.txt
harmless.txt		harmless.txt
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Removing refusals with HF Transformers

Usage

Credits

About

Releases

Packages

Contributors 2

Languages

License

Sumandora/remove-refusals-with-transformers

Folders and files

Latest commit

History

Repository files navigation

Removing refusals with HF Transformers

Usage

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages