-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to implement dataloader for triplet loss/GE2E #235
Comments
I don't think it can be supported in UIO mode. But in dataset_deprecated.py, it may be implemented. |
Thanks for your response but what's UIO mode? |
Check our paper for introduction of the UIO data management. We design this mode for large dataset training. |
I understand DistributedSampler is designed for distributing data into different GPUs. But can we distribute according to the spk id? For example, the code can be (and remove shuffle in processor.py):
|
I just noticed that I set the data type as 'raw', and the above code is not appropriate for 'shard' |
Yeah... It makes senses in the 'raw' mode. Hope it works for you! Good luck! |
In raw mode, it's much easier to implement your function(but slow). But in shard mode it's also possible except that it takes some efforts for the implementation. Possible approach:
But overall, you need to balance the randomness and data processing difficulty. |
Hi, is it possible to make a batch containing M speaker and N utterances for each speaker?
The text was updated successfully, but these errors were encountered: