Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: ConstantWorkersPerHostPoolScheduler #217

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions charm4py/pool.py
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,24 @@ def taskError(self, worker_id, job_id, exception):
raise job.exception
self.schedule()

# Makes one PE inactive on each host so the number of workers is the same on all hosts as
# opposed to the basic PoolScheduler which has one fewer worker on the host with PE 0.
# This can be useful for running tasks on a GPU cluster for example. Running five PEs
# on nodes with 4 GPUs would ensure each worker gets a GPU and no GPUs are left idle.
class ConstantWorkersPerHostPoolScheduler(PoolScheduler):

def __init__(self):
super().__init__()
n_pes = charm.numPes()
n_hosts = charm.numHosts()
pes_per_host = n_pes // n_hosts

assert n_pes % n_hosts == 0 # Enforce constant number of pes per host
assert pes_per_host > 1 # We're letting one pe on each host be unused

self.idle_workers = set([i for i in range(n_pes) if not i % pes_per_host == 0 ])
self.num_workers = len(self.idle_workers)


class Worker(Chare):

Expand Down