Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Dataloader Configuration #60

Open
surajpaib opened this issue Apr 20, 2023 · 4 comments
Open

Enhance Dataloader Configuration #60

surajpaib opened this issue Apr 20, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@surajpaib
Copy link
Collaborator

🚀 Feature Request

A lot of dataloader arguments are mentioned in system parameters. For example, batch_size, drop_last_batch.

Would be good to have a way to set other parameters of the dataloader such as prefetch_factor, persist_workers and potentially other future additions to this.

🛰 Alternatives

Maybe we can add a partial dataloader to the system config? and give it dataset and sampler later?

@surajpaib surajpaib added the enhancement New feature or request label Apr 20, 2023
@surajpaib surajpaib changed the title Propagate parameters through system Enhance Dataloader Configuration Apr 20, 2023
@ibro45
Copy link
Collaborator

ibro45 commented Mar 27, 2024

Discussed with @john-zielke-snkeos:

  • Get rid of batch size, num_workers, samplers, and collate_fns
  • Introduce dataloaders. A user shouldn't need to define the whole dataloader, for example, to define the batch size and num workers, this should be sufficient:
dataloaders:
    train:
        batch_size: 4
        num_workers: 8

and the rest was already set by default. If a user needs a completely different DataLoader, they can go ahead and define _target: ..., but ensure thae the other default args aren't given to it in that case.

@surajpaib
Copy link
Collaborator Author

@ibro45 Agree with this.

This would now bring us into the territory of templates where we set some default object for dataloaders.

If we do this for data loaders and there is a default expected behaviour for it that our user can expect, should we not do this for other items in the config as well?

For instance, trainer can be defaulted to pytorch_lightning.Trainer with benchmark=True, precision=16-mixed, etc.

@surajpaib
Copy link
Collaborator Author

We can also extend this templating and have several templates for different workflows.

Say we want to have a classification workflow, we can set templates for a few different models and losses. We can set up a simple CLI interface for the user to generate a configuration that provides selection between these different templates and spits out a final config that they just have to configure their data for (These templates won't get assigned by default unlike the dataloaders)

We can use something like Cookiecutter (https://github.com/cookiecutter/cookiecutter) to map user CLI to pre-set templates

This will be a separate feature ofcourse and should go in a separate issue if we agree to do it but templating can provide us a lot of extra features without comprising on the dynamicism of the library

@ibro45
Copy link
Collaborator

ibro45 commented Apr 4, 2024

Seems like pydantic could be the way to go in this case. I will attempt to refactor it sometime soon, hopefully, over the weekend. This combo of pydantic and MONAI Bundle will somewhat resemble Hydra's integration with Data Classes.

Let's discuss the defaults in the future PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants