-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC/WIP] Asynchronous ParallelEvaluator #46
base: master
Are you sure you want to change the base?
Conversation
1334f07
to
e55a8f4
Compare
e55a8f4
to
88f9ff1
Compare
88f9ff1
to
3fd19d2
Compare
9490577
to
b7b714d
Compare
b7b714d
to
e9b2a91
Compare
e9b2a91
to
96244aa
Compare
96244aa
to
05c6c5b
Compare
@alyst waht is the status of this branch now? In light of Julia 1.0 soon seems to be coming can we try to unify the different parallellization branches and ideas and merge with master? |
I have a rebased version in my staging branch, I will update this one after #83 . I'm using this branch and it works for me, with some caveats:
I don't know how much 0.7 improves the situation with the workers. Maybe we can check this branch with 0.7alpha and merge it after making sure that Ctrl+C doesn't crash Julia so easily. |
required for putting fitness to/from a (shared) array
05c6c5b
to
653f405
Compare
Use N workers to asynchronously calculate fitnesses. Requests for fitness calculation and completion notifications as well as input parameters and output fitness are passed via SharedVector/SharedMatrix to minimize serialization overhead.
Parallelized versions of - update_population_fitness!() - populate_by_mutants!() - step!()
653f405
to
7e09cb5
Compare
looks like using SharedArrays in parallel_evaluator.jl is not enough
worker2job is replaced with busy_workers
- don't use fitness_slots for communication, add dedicated job_done, job_submitted - add fitness to the archive outside of job_assignments critical section
- convert `@info` into `@debug` - improve debug message verbosity
also avoid race when reading worker param status
and output when the worker shuts down
The PR changes the
ParallelEvaluator
to be asynchronous:async_update_fitness()
, which immediately returns the fitness calculation job Id and the job gets submitted to one of the available worker processes.isready()
function. Also, the candidates with the recently calculated fitness could be processed by callingprocess_completed() do job_id, candidate <custom code> end
routine. (Note: ifisready(job_id)
is called and it returns true, this job would not be enumerated by a call toprocess_completed()
anymore and it's up for the caller to take actions).The old synchronous API (used by NES) is still supported.
BorgMOEA
is updated to support asynchronousParallelEvaluator
: the algorithm runs on master, the new individual is generated by recombination and sent to the parallel evaluation, the further processing of the individual (updating the population and the archive) is postponed until its fitness is evaluated. This should improve the performance for the problems with computationally intensive fitness functions. I see the speed up for my problems, although it's not linear.There are several reasons why it's RFC/WIP:
async_update_fitness()
/isready()
/process_completed()
is somewhat confusing, but so far I had no better ideas given thatFuture{T}
approach would create too much overhead.shutdown!()
call. However,OptController/OptRunController
interface assumes the same optimization method (and its evaluator) could be reused in severalOptRunController
runs. With theParallelEvaluator
it is currently not possible, because all the workers are killed by theshutdown!()
call at the end of the first run. I see two alternatives:start!(Evaluator)
/shutdown!(Evaluator)
methods that needs to be called immediately before/after method iterations (no-ops for the normal evaluators);ParallelEvaluator
, detect idle periods, hybernate the workers when idle (usingwait()
), resume on the new fitness evaluation request. Given the state of the parallelism in julia it's not so easy to get it working nicely.