Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a proper function to add the prefixes #30

Merged
merged 1 commit into from
Aug 9, 2024
Merged

Adding a proper function to add the prefixes #30

merged 1 commit into from
Aug 9, 2024

Conversation

NohTow
Copy link
Collaborator

@NohTow NohTow commented Aug 9, 2024

This PR introduces a proper function to add the query/document prefixes that is more robust and works with all tokenizer (not rely on ". " being tokenized as one unique token, which is not the case for mGTE for example).

This fixes #11.

@NohTow NohTow merged commit 6c3b5c5 into main Aug 9, 2024
2 checks passed
@raphaelsty raphaelsty deleted the fix_prefix branch August 22, 2024 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix tokenization for query/doc marker
1 participant