[RFC/WIP] Asynchronous ParallelEvaluator #46

alyst · 2016-04-21T17:06:54Z

The PR changes the ParallelEvaluator to be asynchronous:

The candidate is submitted to the queue by async_update_fitness(), which immediately returns the fitness calculation job Id and the job gets submitted to one of the available worker processes.
The job status could be tested by the isready() function. Also, the candidates with the recently calculated fitness could be processed by calling process_completed() do job_id, candidate <custom code> end routine. (Note: if isready(job_id) is called and it returns true, this job would not be enumerated by a call to process_completed() anymore and it's up for the caller to take actions).

The old synchronous API (used by NES) is still supported.

BorgMOEA is updated to support asynchronous ParallelEvaluator: the algorithm runs on master, the new individual is generated by recombination and sent to the parallel evaluation, the further processing of the individual (updating the population and the archive) is postponed until its fitness is evaluated. This should improve the performance for the problems with computationally intensive fitness functions. I see the speed up for my problems, although it's not linear.

There are several reasons why it's RFC/WIP:

Unfortunately, the current julia parallelization scheme (remote channels/remote refs implemented using the messages sent through pipes) doesn't allow for an efficient intensive communication between the parallel processes (the profiling shows that the overhead of serializing/deserializing the message and writing/reading to UV pipe stream is too high), so after trying several implementations (this journey could be tracked as I haven't squashed the commits yet) I've ended up using shared arrays for the data and status exchange. Using threads (or maybe better messaging implementation) would be much more efficient, but it's not yet mainlined to Julia.
The logic of async_update_fitness()/isready()/process_completed() is somewhat confusing, but so far I had no better ideas given that Future{T} approach would create too much overhead.
As the communication is shmem-based, the workers use "busy wait" to continuously poll for the new jobs. To avoid 100% CPU usage, the workers have to be killed/suspended after the optimization is finished. This is done by the new shutdown!() call. However, OptController/OptRunController interface assumes the same optimization method (and its evaluator) could be reused in several OptRunController runs. With the ParallelEvaluator it is currently not possible, because all the workers are killed by the shutdown!() call at the end of the first run. I see two alternatives:
- introduce start!(Evaluator)/shutdown!(Evaluator) methods that needs to be called immediately before/after method iterations (no-ops for the normal evaluators);
- implement lazy initialization of the workers in ParallelEvaluator, detect idle periods, hybernate the workers when idle (using wait()), resume on the new fitness evaluation request. Given the state of the parallelism in julia it's not so easy to get it working nicely.

coveralls · 2016-04-21T17:19:09Z

Coverage increased (+3.9%) to 69.692% when pulling 24ea2a8 on alyst:async_parallel_eval into fbdd9ee on robertfeldt:master.

coveralls · 2016-04-21T18:30:19Z

Coverage increased (+4.1%) to 69.978% when pulling 17c2770 on alyst:async_parallel_eval into fbdd9ee on robertfeldt:master.

coveralls · 2016-04-21T18:37:21Z

Coverage increased (+4.1%) to 69.951% when pulling 17c2770 on alyst:async_parallel_eval into fbdd9ee on robertfeldt:master.

coveralls · 2016-04-22T09:32:56Z

Coverage increased (+4.3%) to 70.156% when pulling e55a8f4 on alyst:async_parallel_eval into bd0745c on robertfeldt:master.

coveralls · 2016-04-22T09:41:25Z

Coverage increased (+4.3%) to 70.156% when pulling e55a8f4 on alyst:async_parallel_eval into bd0745c on robertfeldt:master.

coveralls · 2016-04-22T13:05:16Z

Coverage increased (+4.4%) to 70.253% when pulling 88f9ff1 on alyst:async_parallel_eval into bd0745c on robertfeldt:master.

coveralls · 2016-04-24T00:19:26Z

Coverage increased (+3.6%) to 70.327% when pulling 3fd19d2 on alyst:async_parallel_eval into 4dd9ec2 on robertfeldt:master.

coveralls · 2016-04-24T16:35:45Z

Coverage increased (+3.1%) to 69.808% when pulling 9490577 on alyst:async_parallel_eval into 4dd9ec2 on robertfeldt:master.

coveralls · 2016-09-22T12:06:48Z

Coverage increased (+3.3%) to 71.416% when pulling b7b714d on alyst:async_parallel_eval into 572e647 on robertfeldt:master.

coveralls · 2017-07-24T17:03:35Z

Coverage increased (+2.7%) to 71.138% when pulling e9b2a91 on alyst:async_parallel_eval into 91440c1 on robertfeldt:master.

coveralls · 2017-11-13T22:45:03Z

Coverage decreased (-25.8%) to 69.371% when pulling 8225be4 on alyst:async_parallel_eval into cf37814 on robertfeldt:master.

robertfeldt · 2018-05-18T04:59:34Z

@alyst waht is the status of this branch now? In light of Julia 1.0 soon seems to be coming can we try to unify the different parallellization branches and ideas and merge with master?

alyst · 2018-05-22T15:17:38Z

I have a rebased version in my staging branch, I will update this one after #83 .

I'm using this branch and it works for me, with some caveats:

sometimes during initialization I get ReadOnlyMemory() exceptions.
Probably it's related to sharing of the arrays by different workers. Once the initialization of the workers succeeds, the optimization itself seems stable. The number of the workers may affect it.
If the user interrupts julia (Ctrl+C) when the evaluator is in lock(), the whole julia crashes. Probably it could be fixed by wrapping lock() in try/catch at the expense of some performance loss.

I don't know how much 0.7 improves the situation with the workers. Maybe we can check this branch with 0.7alpha and merge it after making sure that Ctrl+C doesn't crash Julia so easily.

required for putting fitness to/from a (shared) array

Use N workers to asynchronously calculate fitnesses. Requests for fitness calculation and completion notifications as well as input parameters and output fitness are passed via SharedVector/SharedMatrix to minimize serialization overhead.

Parallelized versions of - update_population_fitness!() - populate_by_mutants!() - step!()

looks like using SharedArrays in parallel_evaluator.jl is not enough

worker2job is replaced with busy_workers

- don't use fitness_slots for communication, add dedicated job_done, job_submitted - add fitness to the archive outside of job_assignments critical section

- convert `@info` into `@debug` - improve debug message verbosity

also avoid race when reading worker param status

and output when the worker shuts down

alyst force-pushed the async_parallel_eval branch 2 times, most recently from 1334f07 to e55a8f4 Compare April 22, 2016 09:27

alyst force-pushed the async_parallel_eval branch from e55a8f4 to 88f9ff1 Compare April 22, 2016 12:56

alyst mentioned this pull request Apr 23, 2016

More Borg MOEA fixes #48

Merged

alyst force-pushed the async_parallel_eval branch from 88f9ff1 to 3fd19d2 Compare April 24, 2016 00:12

alyst mentioned this pull request Sep 20, 2016

Performance optimizations and 0.5 compatibility #56

Merged

alyst force-pushed the async_parallel_eval branch from 9490577 to b7b714d Compare September 22, 2016 11:55

alyst force-pushed the async_parallel_eval branch from b7b714d to e9b2a91 Compare July 24, 2017 16:17

alyst force-pushed the async_parallel_eval branch from e9b2a91 to 96244aa Compare November 13, 2017 21:28

alyst force-pushed the async_parallel_eval branch from 96244aa to 05c6c5b Compare April 15, 2018 18:30

alyst referenced this pull request May 22, 2018

design proposal for top-level ask/tell interface

8b941ba

robertfeldt mentioned this pull request Jun 19, 2018

Running BlackBoxOptim on a Cluster #84

Open

ntuple fitness: get/setfitness

e5262cb

required for putting fitness to/from a (shared) array

alyst force-pushed the async_parallel_eval branch from 05c6c5b to 653f405 Compare August 27, 2018 12:36

alyst added 3 commits August 27, 2018 14:52

asynchronous ParallelEvaluator

07af505

Use N workers to asynchronously calculate fitnesses. Requests for fitness calculation and completion notifications as well as input parameters and output fitness are passed via SharedVector/SharedMatrix to minimize serialization overhead.

Borg: support ParallelEvaluator in async mode

a3cf5b4

Parallelized versions of - update_population_fitness!() - populate_by_mutants!() - step!()

Borg: test Borg with ParallelEvaluator

59879a6

ParallelEvaluator: multi-objective tests

7e09cb5

alyst force-pushed the async_parallel_eval branch from 653f405 to 7e09cb5 Compare August 27, 2018 12:53

alyst added 7 commits December 17, 2018 16:59

using: add SharedArrays

860d017

looks like using SharedArrays in parallel_evaluator.jl is not enough

parall_eval: workers_handler => workers_listener

90ec83f

master/worker exchange job ids as statuses

0d92e80

worker2job is replaced with busy_workers

par_eval: refactor listener/main task comm

9a8954d

- don't use fitness_slots for communication, add dedicated job_done, job_submitted - add fitness to the archive outside of job_assignments critical section

par_eval: update logging

748f545

- convert `@info` into `@debug` - improve debug message verbosity

shorter job assignment locks

529ace3

also avoid race when reading worker param status

par_eval: count nevals per worker

8225be4

and output when the worker shuts down

alyst mentioned this pull request Dec 21, 2018

use R*-Tree for Pareto frontier #107

Merged

alyst mentioned this pull request Dec 30, 2019

[RFC] Asynchronous multithreaded evaluator #141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC/WIP] Asynchronous ParallelEvaluator #46

[RFC/WIP] Asynchronous ParallelEvaluator #46

alyst commented Apr 21, 2016 •

edited

Loading

coveralls commented Apr 21, 2016

coveralls commented Apr 21, 2016

coveralls commented Apr 21, 2016

coveralls commented Apr 22, 2016

coveralls commented Apr 22, 2016

coveralls commented Apr 22, 2016

coveralls commented Apr 24, 2016

coveralls commented Apr 24, 2016

coveralls commented Sep 22, 2016 •

edited

Loading

coveralls commented Jul 24, 2017 •

edited

Loading

coveralls commented Nov 13, 2017 •

edited

Loading

robertfeldt commented May 18, 2018

alyst commented May 22, 2018

[RFC/WIP] Asynchronous ParallelEvaluator #46

Are you sure you want to change the base?

[RFC/WIP] Asynchronous ParallelEvaluator #46

Conversation

alyst commented Apr 21, 2016 • edited Loading

coveralls commented Apr 21, 2016

coveralls commented Apr 21, 2016

coveralls commented Apr 21, 2016

coveralls commented Apr 22, 2016

coveralls commented Apr 22, 2016

coveralls commented Apr 22, 2016

coveralls commented Apr 24, 2016

coveralls commented Apr 24, 2016

coveralls commented Sep 22, 2016 • edited Loading

coveralls commented Jul 24, 2017 • edited Loading

coveralls commented Nov 13, 2017 • edited Loading

robertfeldt commented May 18, 2018

alyst commented May 22, 2018

alyst commented Apr 21, 2016 •

edited

Loading

coveralls commented Sep 22, 2016 •

edited

Loading

coveralls commented Jul 24, 2017 •

edited

Loading

coveralls commented Nov 13, 2017 •

edited

Loading