-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename linear projection #37
Conversation
… static method to create a Dense layer from ST weights
model = Dense(**config) | ||
if os.path.exists(os.path.join(input_path, "model.safetensors")): | ||
load_safetensors_model(model, os.path.join(input_path, "model.safetensors")) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the else and return the model in the if after load_safetensors_model
It will enable to designed model_load_state_dict
@@ -294,6 +316,18 @@ def __init__( | |||
def load(input_path) -> "ColBERT": | |||
return ColBERT(model_name_or_path=input_path) | |||
|
|||
def num_modules(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can rename this function by
def __len__(self) -> int:
return len(_self.modules)
then you can call len(self) to get the actual number of modules
and self[1].get_sentence_embedding_dimension() != embedding_size | ||
): | ||
logger.warning( | ||
f"The checkpoint contains a dense layer but with incorrect dimension. Replacing it with a Dense layer with output dimensions ({hidden_size}, {embedding_size})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to print the actual embedding dimension you found self[1].get_sentence_embedding_dimension()
def num_modules(self) -> int: | ||
return len(self._modules) | ||
|
||
def convert_dense_layer_from_sentence_transformer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a docstring here ?
return features | ||
|
||
@staticmethod | ||
def from_sentence_transformers(dense_st: DenseSentenceTransformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth to add a docstring here to explain that the Sentence Transformer has an activation function and we don't want one
This PR refactors the old LinearProjection module to better extend the existing sentence transformer layer.
This allows to unify the names so the conversion from one lib checkpoint to another is easier (as the only difference here is in the activation function and the forward function being called on all the tokens instead of just the pooled one).
I changed a bit the behavior of the model creation: