Loading logic rework #52

NohTow · 2024-09-11T14:42:01Z

This PR solves some of the issues related to model loading.

First, it removes the model_kwargs add_pooling_layer that prevented the instantiation of a pooler within BERT models but is not common across all encoders (see #51).
I considered removing the pooler after initialization, but then the saved PyLate model does not have weights for it and it yields a warning saying that those weights are not properly loaded from checkpoint. Although it does not matter as we are not using it anyways, this message can be misleading. Thus, I choose to let it be, as we are using the sequence_embeddings and not the pooled output, it's a small additional useless computation but I did not find a better solution.

Second, it adds a function to support loading a model created using the stanford-nlp library. This has two benefits:

Every ColBERT model (with a base model loadable using ST) is now natively compatible with PyLate, without having to convert it manually. This should greatly enhance the number of compatible models (Model compatiblity #50).
Besides not having to convert it, it also means that we do not have to add the PyLate files to an existing stanford-nlp repository, as we did for Colbert-small. Besides not duplicating the weights, it solves this issue where the Transformer (from ST) folder was not at the root but in a subfolder, which resulted in the model configuration not being properly loaded and thus not properly loading the model to a specified dtype (Add dtype flexibility #49).

Also took the opportunity to add dtype casting to the Dense layer to match the Transformer.

…ayer parameters and casting the Dense layer to dtype if set in model_kwargs

…ontains both

NohTow · 2024-09-12T08:36:04Z

Added a docstring and set the version of ST to pre 3.1 as it introduces breaking changes (I already have some fixes but need more tests + will be better in a dedicated MR).
Also fixed an issue for repository with both PyLate and stanford weights where 2 dense layer were loaded.

raphaelsty

LGTM

NohTow added 4 commits September 11, 2024 14:00

Adding off-the-shelf Stanford-NLP loading, removing the add_pooling_l…

554ae17

…ayer parameters and casting the Dense layer to dtype if set in model_kwargs

Adding a docstring

0a50f97

Change to not load both PyLate AND stanford dense when a repository c…

ba13c2e

…ontains both

Fixing the version of ST to pre 3.1

5daff31

NohTow requested a review from raphaelsty September 12, 2024 08:36

raphaelsty approved these changes Sep 12, 2024

View reviewed changes

raphaelsty merged commit b647061 into main Sep 12, 2024
2 checks passed

This was referenced Sep 12, 2024

Add dtype flexibility #49

Open

distilbert raises an error #51

Closed

NohTow deleted the loading_logic branch October 13, 2024 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading logic rework #52

Loading logic rework #52

NohTow commented Sep 11, 2024

NohTow commented Sep 12, 2024

raphaelsty left a comment

Loading logic rework #52

Loading logic rework #52

Conversation

NohTow commented Sep 11, 2024

NohTow commented Sep 12, 2024

raphaelsty left a comment

Choose a reason for hiding this comment