pip install "bentoml[unsloth]"
See train.py
To use this integration, one can use bentoml.unsloth.build_bento
:
bentoml.unsloth.build_bento(model, tokenizer)
If you model is continued froma fine-tuned checkpoint, then model_name
must be passed as well:
bentoml.unsloth.build_bento(model, tokenizer, model_name="llama-3-continued-from-checkpoint")
Important
Make sure to save the chat templates to tokenizer instance to make sure generations are correct based on how you setup your data pipeline. See example and documentation for more information.