Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix local loading of Stanford model #68

Merged
merged 1 commit into from
Oct 22, 2024
Merged

Conversation

NohTow
Copy link
Collaborator

@NohTow NohTow commented Oct 22, 2024

When making the test for the new loading logic for Stanford models, I only tested with remote (HF repo) models.
Turns out, the code does not work if the model is a local folder.
The fix was simple: correctly set the path of the safetensor if the model is local.

@NohTow NohTow merged commit b34bba0 into main Oct 22, 2024
2 checks passed
@NohTow NohTow deleted the fix_local_stanford_loading branch October 22, 2024 11:28
@@ -121,6 +121,9 @@ def from_stanford_weights(
token=token,
use_auth_token=use_auth_token,
)
# If the model a local folder, load the safetensor
else:
model_name_or_path = os.path.join(model_name_or_path, "model.safetensors")
Copy link

@jessiejuachon jessiejuachon Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My local model folder does not have "model.safetensors" file. Here are the files that I have:

  • config.json
  • vocab.txt
  • tokenizer_config.json
  • tokenizer.json
  • special_tokens_map.json
  • pytorch_model.bin
  • artifact.metadata

Can pylate support this ColBERT model? @NohTow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed to me that every Stanford-nlp model had the weights in the form of a safetensors aswell.
I could make the loading work with .bin aswell, but I wonder why you do not have the weights in this format aswell. How did you train the model?
There should be a way to output safetensors (or you can convert your weights into a safe tensor)

Copy link

@jessiejuachon jessiejuachon Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @NohTow . The model was trained in collaboration with the Stanford folks a few years back. We technically can transform it, but we don't want to make the change now as we are evaluating other factors. Would it possible to have .bin supported by Pylate as a workaround (i.e. only if it is a simple change) ? We would appreciate if this can be accommodated.

Copy link
Collaborator Author

@NohTow NohTow Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be at a conference for a week so I cannot really have a look at it for now.
After that, I would need to check if the stanford-nlp training still can produce no safetensor and will only add this if it does. It might be that the safetensors are created by the bot when the model is uploaded on HF, in this case, I'll check to load the .bin for local models.

I honestly think it will be easier and faster if you just convert your .bin into a safetensor until then, it should not change anything to the model (and you can keep the original .bin).

Note that I do not receive notification for a closed MR, so please open a dedicated issue if you feel like we should implement it, but again, this won't be in the next days.

Copy link
Collaborator Author

@NohTow NohTow Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think replacing the code by this should work but I cannot test it for now since I am on the move.
I might consider just loading from the pytorch_model.bin anyways as this will be present no matter what, but this is better practice to load safetensors.
Hope it helps.

# Check if the model is locally available
if not (os.path.exists(os.path.join(model_name_or_path))):
    # Else download the model/use the cached version
    model_name_or_path = cached_file(
        model_name_or_path,
        filename="pytorch_model.bin",
        cache_dir=cache_folder,
        revision=revision,
        local_files_only=local_files_only,
        token=token,
        use_auth_token=use_auth_token,
    )
# If the model a local folder, load the PyTorch model
else:
    model_name_or_path = os.path.join(model_name_or_path, "pytorch_model.bin")

# Load the state dict using torch.load instead of safe_open
state_dict = {
    "linear.weight": torch.load(model_name_or_path, map_location="cpu")[
        "linear.weight"
    ]
}

# Determine input and output dimensions
in_features = state_dict["linear.weight"].shape[1]
out_features = state_dict["linear.weight"].shape[0]

# Create Dense layer instance
model = Dense(in_features=in_features, out_features=out_features, bias=False)

model.load_state_dict(state_dict, strict=False)
return model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants