-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix local loading of Stanford model #68
Conversation
@@ -121,6 +121,9 @@ def from_stanford_weights( | |||
token=token, | |||
use_auth_token=use_auth_token, | |||
) | |||
# If the model a local folder, load the safetensor | |||
else: | |||
model_name_or_path = os.path.join(model_name_or_path, "model.safetensors") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My local model folder does not have "model.safetensors" file. Here are the files that I have:
- config.json
- vocab.txt
- tokenizer_config.json
- tokenizer.json
- special_tokens_map.json
- pytorch_model.bin
- artifact.metadata
Can pylate support this ColBERT model? @NohTow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed to me that every Stanford-nlp model had the weights in the form of a safetensors aswell.
I could make the loading work with .bin aswell, but I wonder why you do not have the weights in this format aswell. How did you train the model?
There should be a way to output safetensors (or you can convert your weights into a safe tensor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @NohTow . The model was trained in collaboration with the Stanford folks a few years back. We technically can transform it, but we don't want to make the change now as we are evaluating other factors. Would it possible to have .bin supported by Pylate as a workaround (i.e. only if it is a simple change) ? We would appreciate if this can be accommodated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be at a conference for a week so I cannot really have a look at it for now.
After that, I would need to check if the stanford-nlp training still can produce no safetensor and will only add this if it does. It might be that the safetensors are created by the bot when the model is uploaded on HF, in this case, I'll check to load the .bin for local models.
I honestly think it will be easier and faster if you just convert your .bin into a safetensor until then, it should not change anything to the model (and you can keep the original .bin).
Note that I do not receive notification for a closed MR, so please open a dedicated issue if you feel like we should implement it, but again, this won't be in the next days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think replacing the code by this should work but I cannot test it for now since I am on the move.
I might consider just loading from the pytorch_model.bin anyways as this will be present no matter what, but this is better practice to load safetensors.
Hope it helps.
# Check if the model is locally available
if not (os.path.exists(os.path.join(model_name_or_path))):
# Else download the model/use the cached version
model_name_or_path = cached_file(
model_name_or_path,
filename="pytorch_model.bin",
cache_dir=cache_folder,
revision=revision,
local_files_only=local_files_only,
token=token,
use_auth_token=use_auth_token,
)
# If the model a local folder, load the PyTorch model
else:
model_name_or_path = os.path.join(model_name_or_path, "pytorch_model.bin")
# Load the state dict using torch.load instead of safe_open
state_dict = {
"linear.weight": torch.load(model_name_or_path, map_location="cpu")[
"linear.weight"
]
}
# Determine input and output dimensions
in_features = state_dict["linear.weight"].shape[1]
out_features = state_dict["linear.weight"].shape[0]
# Create Dense layer instance
model = Dense(in_features=in_features, out_features=out_features, bias=False)
model.load_state_dict(state_dict, strict=False)
return model
When making the test for the new loading logic for Stanford models, I only tested with remote (HF repo) models.
Turns out, the code does not work if the model is a local folder.
The fix was simple: correctly set the path of the safetensor if the model is local.