Releases: huggingface/setfit
v1.1.0 - Sentence Transformers as the finetuning backend; tackle deprecations of other dependencies
This release introduces a new backend to finetune embedding models, based on the Sentence Transformers Trainer, tackles deprecations of other dependencies like transformers
, deprecates Python 3.7 while adding support for new Python versions, and applies some other minor fixes. There shouldn't be any breaking changes.
Install this version with
pip install -U setfit
Defer the embedding model finetuning phase to Sentence Transformers (#554)
In SetFit v1.0, the old model.fit
training from Sentence Transformers was replaced by a custom training loop that has some features the former was missing, such as loss logging, useful callbacks, etc. However, since then, Sentence Transformers v3 has released, which also added all of the features that were previously lacking. To simplify the training moving forward, the training is now (once again) deferred to Sentence Transformers.
Because both the old and new training approach are inspired by the transformers
Trainer
, there should not be any breaking changes. The primary notable change is that training now requires accelerate
(as Sentence Transformers requires it), and we benefit from some of the Sentence Transformers training features, such as multi-GPU training.
Solve discrepancies with new versions of dependencies
To ensure compatibility with the latest versions of dependencies, the following issues have been addressed:
- Follow the (soft) deprecation of
evaluation_strategy
toeval_strategy
(#538). This previously resulted in crashes if yourtransformers
version was too new. - Avoid the now-deprecated
DatasetFilter
(#527). This previously resulted in crashes if yourhuggingface-hub
version was too new.
Python version support
- Following Python 3.7 its deprecation by the Python team, Python 3.7 is now also deprecated by SetFit moving forward. (#506)
- We've added official support for Python 3.11 and 3.12 now that both are included in our test suite. (#550)
Minor changes
- Firm up
max_steps
andeval_max_steps
: rather than being a rough maximum limit, the limit is now exact. This can be helpful to avoid memory overflow, especially in situations with notable dataset imbalances. (#549) - Training and validation losses are now nicely logged in notebooks. (#557)
Minor bug fixes
- Fix bug where
device
parameter inSetFitHead
is ignored if CUDA is not available. (#518)
All Changes
- [
absa
] Add SetFitABSA notebook on FiQA by @tomaarsen in #471 - Refactor training logs & warmup_proportion by @tomaarsen in #475
- [
feat
] Set labels based on head classes, if possible by @tomaarsen in #476 optimum-intel
notebook by @danielkorat in #480- Optimum-Intel notebook: fix quantization explanation by @danielkorat in #483
- Optimum-Intel Notebook by @danielkorat in #484
- Fix Errors in
setfit-onnx-optimum
Notebook by @danielkorat in #496 - Add files via upload by @MosheWasserb in #497
- Switch to differentiable head, larger input sequence by @danielkorat in #489
- Bugfix: Error in optimum-intel notebook due to missing attributes after
torch.compile()
by @danielkorat in #517 - Renamed
evaluation_strategy
toeval_strategy
by @sergiopaniego in #538 - [CI] Deprecate Python3.7 and invalidate cache weekly by @Wauplin in #506
- Don't use deprecated
DatasetFilter
+ update deps by @Wauplin in #527 - Fix SetFitModel: not a dataclass, not a PyTorchModelHubMixin by @Wauplin in #505
- [
tests
] Resolve remaining test failures by @tomaarsen in #550 - Train via the Sentence Transformers Trainer from ST v3 by @tomaarsen in #554
- Update absa.mdx with necessary imports by @splevine in #533
- Fix bug where SetFitHead not moved to non-cuda devices on init by @ajmssc in #518
- Fix pandas groupby -> apply warning by @tomaarsen in #555
- Check if max pairs limit reached in
generate_pairs
andgenerate_multilabel_pairs
by @OscarRunsCode in #549 - Prevent sampling 2x more than requested when max_steps is set by @tomaarsen in #556
- Create custom NotebookCallback subclass for embedding_loss, etc. by @tomaarsen in [(#557)](#557
New Contributors
- @sergiopaniego made their first contribution in #538
- @Wauplin made their first contribution in #506
- @splevine made their first contribution in #533
- @ajmssc made their first contribution in #518
- @OscarRunsCode made their first contribution in #549
Full Changelog: v1.0.3...v1.1.0
v1.0.3
This is a patch release with two notable fixes and a feature:
- Training logs now correctly list the number of training examples (now called "unique pairs")
- The warmup steps is now based on the number of steps rather than
args.max_steps
ifargs.max_steps
> the number of steps. This prevents accidentally being in warm-up for longer than the desired warmup proportion. - When training with string labels, the model now tries to automatically set the string labels to
SetFitModel.labels
if this variable hasn't been defined yet.
The PRs:
- Set labels based on head classes, if possible by @tomaarsen in #476
- Refactor training logs & fix warmup_proportion by @tomaarsen in #475
Full Changelog: v1.0.2...v1.0.3
v1.0.2
What's Changed
- Fix: Python-ify evaluation results before writing model card by @tomaarsen in #460
- Resolve crash with predict_proba & multi-output by @tomaarsen in #466
- Remove breaking shuffle DataLoader option by @tomaarsen in #470
- Predict for ABSA models with a gold aspect dataset by @tomaarsen in #469
- Prepare SetFit for upcoming 2.3.0 release of SentenceTransformers by @tomaarsen in #463
Full Changelog: v1.0.1...v1.0.2
v1.0.1
v1.0.0
v1.0.0 Full SetFit Release
This release heavily refactors the SetFit trainer and introduces some much requested features, such as:
- New Trainer, new TrainingArguments with many, many new arguments.
- Configurable logging, automatic logging to Weights & Biases and Tensorboard if installed.
- Evaluation during training, early stopping support to combat overfitting.
- Checkpointing + loading the best model at the end.
- SetFit for Aspect Based Sentiment Analysis in collaboration with Intel Labs.
- Heavily improved automatic model card generation.
- Extensive callbacks support based on transformers.
- Full, extensive documentation: http://hf.co/docs/setfit
- and more!
v1.0.0 Migration Guide
Read the v1.0.0 Migration Guide in the documentation: https://hf.co/docs/setfit/how_to/v1.0.0_migration_guide
v1.0.0 Detailed Release Notes
Read the more detailed release notes in the documentation: https://huggingface.co/docs/setfit/how_to/v1.0.0_migration_guide#v100-changelog
What's Changed
- Preserve dataset features in
sample_dataset
by @grofte in #396 - Allow other datasets in
trainer.evaluate()
by @grofte in #402 - Normalize device to CPU when evaluating by @tomaarsen in #363
- show_progress_bar as parameter on predict and predict_prob by @davidsbatista in #429
- Refactor to introduce
Trainer
&TrainingArguments
, add SetFit ABSA by @tomaarsen in #265 - fix: make sampling more reproducible by @yahiaelgamal in #441
- Allow setting batch size in SetFitModel.predict by @tomaarsen in #443
- Save differentiable model head on CPU by @tomaarsen in #444
- Allow 'device' on SetFitModel.from_pretrained() by @tomaarsen in #445
- Add notebook to demonstrate how efficiently running SetFit with ONNX by @MosheWasserb in #435
- Add "labels" to SetFitModel, store/load from configuration file by @tomaarsen in #447
- Allow passing strings to model.predict by @tomaarsen in #448
- Allow partial column mappings by @tomaarsen in #449
- Allow normalize_embeddings with a differentiable head by @tomaarsen in #450
- Heavily improve automatic model card generation by @tomaarsen in #452
- Also pass
metric_kwargs
to custom metric callable by @tomaarsen in #456 - Prepare v1.0.0 release -
Trainer
,TrainingArguments
, SetFitABSA, logging, evaluation during training, callbacks, docs by @tomaarsen in #439
New Contributors
- @rhelmeczi made their first contribution in #362
- @bofenghuang made their first contribution in #366
- @davidberenstein1957 made their first contribution in #384
- @alvarobartt made their first contribution in #397
- @bogedy made their first contribution in #361
- @grofte made their first contribution in #396
- @davidsbatista made their first contribution in #429
- @rtrompier made their first contribution in #433
- @yahiaelgamal made their first contribution in #441
Full Changelog: v0.7.0...v1.0.0
v0.7.0
v0.7.0 Bug Fixes Galore
This release introduces numerous bug fixes, including critical ones for push_to_hub
, save_pretrained
and distillation training.
Bug fixes and improvements
- Add a warning if an unsplit dataset is passed to SetFitTrainer by @jaalu in #299
- Improve dataset pre-processing speeds for large datasets by @logan-markewich in #309
- Add Path support to
_save_pretrained
, resolveTypeError: unsupported operand type(s) for +: 'PosixPath' and 'str'
by @tomaarsen in #332 - Add Hallmarks of Cancer notebook by @MosheWasserb in #333
- Initialize SetFitModel with
cls
instead by @kobiche in #341 - Allow distillation training with models using differentiable heads by @tomaarsen in #343
- Prevent TypeError on
model.predict
when using string labels by @tomaarsen in #331 - Restrict
pandas
to <2 for compatibility tests by @tomaarsen in #350 - Update
Trainer.push_to_hub
to use**kwargs
by @tomaarsen in #351 - Add metric keyword arguments, e.g. add "average" strategy to f1 by @tomaarsen in #353
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @jaalu
- Add a warning if an unsplit dataset is passed to SetFitTrainer (#299)
- @tomaarsen
- Add comparison plotting script (#319)
- Resolve IndexError if there is just one K-shot scenario
- Reintroduce Usage in README until docs are ready
- Add Path support to _save_pretrained (#332)
- Allow distillation training with models using differentiable heads (#343)
- Prevent TypeError on
model.predict
when using string labels (#331) - Restrict
pandas
to <2 for compatibility tests (#350) - Update
Trainer.push_to_hub
to use**kwargs
(#351) - Add metric keyword arguments, e.g. add "average" strategy to f1 (#353)
- @EdAbati
- @MosheWasserb
- Add Hallmarks of Cancer notebook (#333)
v0.6.0
v0.6.0 OpenVINO exporter, model cards, and various quality of life improvements 🔥
To bring in the new year, this release comes with many bug fixes and quality of life improvements around using SetFit models. It also provides:
- an OpenVINO exporter that you can optimise your models for inference with. Check out the
notebooks
for an example. - a dedicated model card with metadata and usage instructions. See here for an example output from
push_to_hub()
: https://huggingface.co/lewtun/setfit-new-model-card
Bug fixes and improvements
- Always install the checked-out setfit by @tomaarsen in #235
- Add SetFitModel.to by @tomaarsen in #229)
- Add distillation trainer example by @lewtun in #202
- Prevent overriding the sample size in
sample_dataset
by @tomaarsen in #231 - add related work in readme by @Yongtae723 in #239
- Fix seed in
trainer.py
by @danielkorat in #243 - Always display test coverage; add tests by @tomaarsen in #240
- Add Tom to list of maintainers by @lewtun in #253
- Add proper model card by @lewtun in #252
- Added support of OpenVINO export by @AlexKoff88 in #214
- Add has_differentiable_head property to SetFitModel by @zachschillaci27 in #257
- Resolve numpy.ndarray type error with predict_proba by @jegork in #207
- Refactor model_head initialization in SetFitModel by @zachschillaci27 in #263
- Feature/deprecate binary cross entropy loss by @blakechi in #203
- Fix type hints by @Yongtae723 in #266
- pass auth token to sentence transformer by @ken-myers in #277
- Add multi-target support to SetFitHead by @Yongtae723 and @OskarLiew in #272
- Automatically create summary table after
scripts/setfit/run_fewshot.py
by @tomaarsen in #262 - Fix squared optimization steps bug by @twerkmeister in #280
- Fix squared optimization steps bug in distillation trainer by @tomaarsen in #284
- Dynamic features in datasets based on model input names by @AleksanderObuchowski in #288
- Resolve
SentenceTransformer
resetting devices after moving aSetFitModel
by @tomaarsen in #283 - add
run_zeroshot.py
; add functionality todata.get_templated_dataset()
(formerlyadd_templated_examples()
) by @danielkorat in #292 - Exclude compatibility versions from dev setup by @tomaarsen in #286
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @tomaarsen
- Always install the checked-out setfit (#235)
- Add SetFitModel.to (#229) (#236)
- Prevent overriding the sample size in
sample_dataset
(#231) - Always display test coverage; add tests (#240)
- Automatically create summary table after
scripts/setfit/run_fewshot.py
(#262) - Fix squared optimization steps bug in distillation trainer (#284)
- Resolve
SentenceTransformer
resetting devices after moving aSetFitModel
(#283) - Reformat according to the newest black version
- Remove doubled space in warning message
- Exclude compatibility versions from dev setup (#286)
- @Yongtae723
- @danielkorat
- @AlexKoff88
- Added support of OpenVINO export (#214)
v0.5.0 Knowledge distillation trainer & ONNX exporter
This release comes with two main features:
- A
DistillationSetFitTrainer
class that allows users to use unlabeled data to significantly boost the performance of small models like MiniLM. See this workshop for an end-to-end example. - An ONNX exporter that converts the
SetFit
model instances into ONNX graphs for downstream inference + optimisation. Checkout thenotebooks
folder for an end-to-end example.
Kudos to @orenpereg and @nbertagnolli for implementing both of these features 🔥
Bug fixes and improvements
- Tidy up Makefile & create notebook table by @lewtun in #163
- Fix by @lewtun in #164
- Fixed typo in model head from predict_prob to predict_proba by @nbertagnolli in #171
- Distill trainer by @orenpereg in #166
- Update evaluate by @lvwerra in #194
- Use scikit-learn rather than sklearn in requirements files by @lesteve in #200
- Bugfix/body and head on different devices by @blakechi in #175
- add option to normalize embeddings by @PhilipMay in #177
- delete duplicated code by @Yongtae723 in #183
- Throw clear ValueError when neglecting to pass train_dataset to DistillationSetFitTrainer by @tomaarsen in #190
- add option to set samples_per_label by @PhilipMay in #196
- Resolve typo: sklean -> sklearn, #220 by @tomaarsen in #221
- Allow setting max length by @blakechi in #176
- add doc for
num_iterations
by @PhilipMay in #215 - Allow training progress bars to be disabled by @tomaarsen in #218
- Added initial onnx export function by @nbertagnolli in #156
- Fix/input type hint by @Yongtae723 in #184
- fixed spell errors in code example by @Gladiator07 in #210
- For
scripts/setfit/run_fewshot.py
, add warning for class imbalance w. accuracy by @tomaarsen in #204 - No longer needlessly deepcopy the original model state by @tomaarsen in #201
- Various cleanups; type hint fixes incl. corresponding to PEP 484 by @tomaarsen in #185
- Expand CI tests using matrix; make dependencies less restrictive; fix ONNX tests by @tomaarsen in #233
- Add SetFitModel.to by @jegork in #229
- Revert "Add SetFitModel.to by @lewtun in #229)"
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @nbertagnolli
- @orenpereg
- Distill trainer (#166)
v0.4.1 Patch release
Fixes an issue on Google Colab, where the default version of Python 3.7 is incompatible with the Literal
type. See #162 for more details.
v0.4.0 Differentiable heads & various quality of life improvements
Differentiable heads for SetFitModel
@blakechi has implemented a differentiable head in PyTorch for SetFitModel
that enables the model to be trained end-to-end. The implementation is backwards compatible with the scikit-learn
heads and can be activated by setting use_differentiable_head=True
when loading SetFitModel
. Here's a full example:
from datasets import load_dataset
from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitModel, SetFitTrainer
# Load a dataset from the Hugging Face Hub
dataset = load_dataset("sst2")
# Simulate the few-shot regime by sampling 8 examples per class
num_classes = 2
train_dataset = dataset["train"].shuffle(seed=42).select(range(8 * num_classes))
eval_dataset = dataset["validation"]
# Load a SetFit model from Hub
model = SetFitModel.from_pretrained(
"sentence-transformers/paraphrase-mpnet-base-v2",
use_differentiable_head=True,
head_params={"out_features": num_classes},
)
# Create trainer
trainer = SetFitTrainer(
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
loss_class=CosineSimilarityLoss,
metric="accuracy",
batch_size=16,
num_iterations=20, # The number of text pairs to generate for contrastive learning
num_epochs=1, # The number of epochs to use for constrastive learning
column_mapping={"sentence": "text", "label": "label"} # Map dataset columns to text/label expected by trainer
)
# Train and evaluate
trainer.freeze() # Freeze the head
trainer.train() # Train only the body
# Unfreeze the head and freeze the body -> head-only training
trainer.unfreeze(keep_body_frozen=True)
# or
# Unfreeze the head and unfreeze the body -> end-to-end training
trainer.unfreeze(keep_body_frozen=False)
trainer.train(
num_epochs=25, # The number of epochs to train the head or the whole model (body and head)
batch_size=16,
body_learning_rate=1e-5, # The body's learning rate
learning_rate=1e-2, # The head's learning rate
l2_weight=0.0, # Weight decay on **both** the body and head. If `None`, will use 0.01.
)
metrics = trainer.evaluate()
# Push model to the Hub
trainer.push_to_hub("my-awesome-setfit-model")
# Download from Hub and run inference
model = SetFitModel.from_pretrained("lewtun/my-awesome-setfit-model")
# Run inference
preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])
Bug fixes and improvements
- add num_epochs to train_step calculation by @PhilipMay in #139
- Support for the differentiable head by @blakechi in #112
- redirect call to predict by @PhilipMay in #142
- fix: templated examples copy empty vector by @pdhall99 in #148
- Add support to kwargs in
compute()
method called bytrainer.evaluate()
by @mpangrazzi in #125 - Small fix on hyperparameter search by @Mouhanedg56 in #150
- Fix typo: temerature => temperature by @tomaarsen in #155
- Add the usage and relevant info. of the differentiable head to README by @blakechi in #149
- Fix non default
loss_class
issue by @PhilipMay in #154 - Add sampling function & update notebooks by @lewtun in #146
- Fix typos: image(s) -> sentence(s) by @victorjmarin in #160
- Add more loss function options by @PhilipMay in #159
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @pdhall99
- fix: allow load of pretrained model without head
- fix: templated examples copy empty vector (#148)
- @PhilipMay
- @blakechi
- @mpangrazzi
- Add support to kwargs in
compute()
method called bytrainer.evaluate()
(#125)
- Add support to kwargs in