Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About data upsampling #19

Open
Raion-Shin opened this issue Aug 27, 2024 · 0 comments
Open

About data upsampling #19

Raion-Shin opened this issue Aug 27, 2024 · 0 comments

Comments

@Raion-Shin
Copy link

Firstly, many thanks to your contribution to the community, the datasets can be very helpful.

Can you explain the meaning of "upsampled" in https://huggingface.co/datasets/TIGER-Lab/M-BEIR? How did you upsample the smaller datasets?

mbeir_union_up_train.jsonl: This file is the default training data for in-batch contrastive training specifically designed for UniIR models. It aggregates all the data from the train directory and datasets with relatively smaller sizes have been upsampled to balance the training process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant