Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hpc-python lesson should cover Dask #12

Open
rbavery opened this issue Sep 8, 2019 · 2 comments
Open

hpc-python lesson should cover Dask #12

rbavery opened this issue Sep 8, 2019 · 2 comments

Comments

@rbavery
Copy link

rbavery commented Sep 8, 2019

this and the dask documentation already has some really good examples that could serve as a jumping off point: https://github.com/sdsc/sdsc-summer-institute-2019/tree/master/hpc0_python_hpc

an overview of the "why" of dask: https://notamonadtutorial.com/interview-with-dasks-creator-scale-your-python-from-one-computer-to-a-thousand-b4483376f200

@psteinb
Copy link
Member

psteinb commented Sep 18, 2019

I second this entirely! dask is very near to HPC. This is also the reason why I included it in hpc-in-a-day. There is serves as an example of more big-data-style APIs. I like it as it puts productivity into the central focus. Where would you suggest this could go inside this repo?

@rbavery
Copy link
Author

rbavery commented Oct 1, 2019

I think the introduction to parallel computing section could replace multiprocessing with an introduction to dask.delayed for parallelizing custom workflows and/or Dask Arrays. These HPC lessons could serve as a template: https://github.com/sdsc/sdsc-summer-institute-2019/tree/master/hpc0_python_hpc

While both multiprocessing and dask could get the job done, I think that Dask's performance dashboard is huge for being able to profile bid data workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants