Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize mark_duplicates_and_sort #51

Open
chrisamiller opened this issue Jun 15, 2022 · 1 comment
Open

optimize mark_duplicates_and_sort #51

chrisamiller opened this issue Jun 15, 2022 · 1 comment
Labels
performance optimizations of run time adn/or cost

Comments

@chrisamiller
Copy link
Member

The tools/mark_duplicates_and_sort.wdl is a bottleneck, especially for WGS. It's expensive, and that's partially because it is long-running and gets preempted. Do some local testing on the cluster to explore options for optimizing it:

  • right now the sort and markdup get 8 cores each. Is that the optimal ratio? If one is faster than the other, there'll be wasted cycles.
  • If we increase the number of overall cores, how does that affect runtime? (do we saturate I/O? is that different between HDD/SSD?)
  • Can we prevent localization of the input files to save an hour or so?
  • would giving more ram to the sort part of that step allow it to do less slow writes of temp files to disk and speed things up?
@chrisamiller
Copy link
Member Author

Breakdown of costs/timing for WGS:

sample seconds cpuCost memCost diskCost diskType totalCost
HCC1395 normal 24870.788 0.771547112 0.42860658 0.221073671 local-disk 576 HDD 1.421227363
HCC1395 tumor 46089.762 1.429806839 1.143282152 0.750381156 local-disk 1055 HDD 3.323470147

Both ran on a custom-16-97280 instance

@malachig malachig added the performance optimizations of run time adn/or cost label Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance optimizations of run time adn/or cost
Projects
None yet
Development

No branches or pull requests

2 participants