Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conventions around snakemake #5103

Open
chillenzer opened this issue Sep 11, 2024 · 1 comment
Open

Conventions around snakemake #5103

chillenzer opened this issue Sep 11, 2024 · 1 comment
Labels
component: examples PIConGPU or PMacc examples component: tools scripts, python libs and CMake

Comments

@chillenzer
Copy link
Contributor

In closing this PR, I had a look at the Snakemake workflows here and there are a few things I stumbled across:

  1. There is a best practices page in the Snakemake docs suggesting amongst other things to use

  2. Snakemake provides its own environment management/deployment as well as batch system abstraction (as a plugin). This is functionality that significantly overlaps with tbg and both of these are delicate issues in the fragile and diverse HPC ecosystems we run in.

In the current stage, our usage does not conform to any of these standards. If our strategic decisions imply to professionalise the use of Snakemake, I think there would be value in moving more into that direction. However, my personal experience is that these things are awfully fiddly to setup on a cluster and their adoption would create tensions with tbg. (There are, of course, some low-hanging fruits like the adoption of the directory structure and formatter, e.g., in pre-commit.)

This issue is supposed to spark a discussion and trigger a conscious decision about such things. It is by no means intended as a suggestion to implement the above ideas. Involved in this discussion should probably be at least @psychocoderHPC, @PrometheusPi, @JessicaTiebel.

@chillenzer chillenzer added component: tools scripts, python libs and CMake component: examples PIConGPU or PMacc examples labels Sep 11, 2024
@JessicaTiebel
Copy link
Contributor

Thanks for looking into the Snakemake workflow and giving input.

For 1.:
To be honest I wasn't aware of these things. Having a code quality checker, a formatter and standardised directory structure sounds like a very good idea and on the first glance also doesn't seem to be super much work. Thanks for bringing this up.

For 2.:
I am more aware of the issues you mention here. I agree that using the batch system abstraction has a lot of advantages and makes transition between different clusters much easier. This would be a huge benefit, but also is a lot of work (also the reason I didn't do it initially). I am not sure if it is possible to abstract the tbg in a way that it is also flexible enough to work with different clusters. I am definitely open for discussions regarding this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: examples PIConGPU or PMacc examples component: tools scripts, python libs and CMake
Projects
None yet
Development

No branches or pull requests

2 participants