Parallelization across regions #279

cjw85 · 2021-03-24T14:09:02Z

Thank you. Would you consider adding that as a feature to have it detect a set number of regions or chromosomes and parallelize itself? Also it does not seem like medaka_haploid_variant is able to take --regions as an input. How should I run it independently on each chromosome?

Originally posted by @jpn2021 in #263 (comment)

The text was updated successfully, but these errors were encountered:

cjw85 · 2021-03-24T14:22:38Z

@jpn2021 @Kirk3gaard

It looks like an oversight that medaka_haploid_variant doesn't take a --regions argument like other programs. We can look at adding that.

More generally the medaka programs don't implement parallelization across chromosomes/regions for two reasons:
a) most tasks are trivially parallelizable (so the programs can just be run multiple times)
b) the subtleties in handling hardware resources, e.g. implementing parallelization for CPU-only settings requires a different strategy to a single- or -multi-GPU setting.

Since medaka is fundamentally a piece of algorithm research, implementing some of these niceities takes a back seat to investigating new methods. We endeavour to stick to a Unix philosophy of creating composable tools that do one job such that users can use the tools flexibly in a manner that suits their situation.

cjw85 added contributions welcome enhancement labels Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization across regions #279

Parallelization across regions #279

cjw85 commented Mar 24, 2021

cjw85 commented Mar 24, 2021

Parallelization across regions #279

Parallelization across regions #279

Comments

cjw85 commented Mar 24, 2021

cjw85 commented Mar 24, 2021