Setting normalize_scores default to False and adding some documentati…

…on about the parameter
lightonai · Oct 15, 2024 · 7a0671f · 7a0671f
1 parent ddaf8f8
commit 7a0671f
Show file tree

Hide file tree

Showing 2 changed files with 7 additions and 1 deletion.
diff --git a/docs/documentation/training.md b/docs/documentation/training.md
@@ -176,6 +176,12 @@ trainer.train()
 
     Refer to this [documentation](https://sbert.net/docs/sentence_transformer/training/distributed.html) for more information.
 
+Note that the Distillation also support min-max normalizing the output scores, which have been shown to improve results if the teacher scores are also normalized in [JaColBERTv2.5](https://arxiv.org/pdf/2407.20750) but the gains are not guaranteed as shown in [Jina-ColBERT-v2](https://arxiv.org/abs/2408.16672).
+To normalize the output scores, simply use the ```normalize_scores``` parameter when creating the loss object (you still have to normalize the scores in your dataset):
+```python
+train_loss = losses.Distillation(model=model, normalize_scores=True)
+```
+
 ## ColBERT parameters
 All the parameters of the ColBERT modeling can be found [here](https://lightonai.github.io/pylate/api/models/ColBERT/#parameters). Important parameters to consider are:
 

diff --git a/pylate/losses/distillation.py b/pylate/losses/distillation.py
@@ -54,7 +54,7 @@ def __init__(
         model: ColBERT,
         score_metric: Callable = colbert_kd_scores,
         size_average: bool = True,
-        normalize_scores: bool = True,
+        normalize_scores: bool = False,
     ) -> None:
         super(Distillation, self).__init__()
         self.score_metric = score_metric