Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC/WIP] Asynchronous ParallelEvaluator #46

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

alyst
Copy link
Contributor

@alyst alyst commented Apr 21, 2016

The PR changes the ParallelEvaluator to be asynchronous:

  1. The candidate is submitted to the queue by async_update_fitness(), which immediately returns the fitness calculation job Id and the job gets submitted to one of the available worker processes.
  2. The job status could be tested by the isready() function. Also, the candidates with the recently calculated fitness could be processed by calling process_completed() do job_id, candidate <custom code> end routine. (Note: if isready(job_id) is called and it returns true, this job would not be enumerated by a call to process_completed() anymore and it's up for the caller to take actions).

The old synchronous API (used by NES) is still supported.

BorgMOEA is updated to support asynchronous ParallelEvaluator: the algorithm runs on master, the new individual is generated by recombination and sent to the parallel evaluation, the further processing of the individual (updating the population and the archive) is postponed until its fitness is evaluated. This should improve the performance for the problems with computationally intensive fitness functions. I see the speed up for my problems, although it's not linear.

There are several reasons why it's RFC/WIP:

  1. Unfortunately, the current julia parallelization scheme (remote channels/remote refs implemented using the messages sent through pipes) doesn't allow for an efficient intensive communication between the parallel processes (the profiling shows that the overhead of serializing/deserializing the message and writing/reading to UV pipe stream is too high), so after trying several implementations (this journey could be tracked as I haven't squashed the commits yet) I've ended up using shared arrays for the data and status exchange. Using threads (or maybe better messaging implementation) would be much more efficient, but it's not yet mainlined to Julia.
  2. The logic of async_update_fitness()/isready()/process_completed() is somewhat confusing, but so far I had no better ideas given that Future{T} approach would create too much overhead.
  3. As the communication is shmem-based, the workers use "busy wait" to continuously poll for the new jobs. To avoid 100% CPU usage, the workers have to be killed/suspended after the optimization is finished. This is done by the new shutdown!() call. However, OptController/OptRunController interface assumes the same optimization method (and its evaluator) could be reused in several OptRunController runs. With the ParallelEvaluator it is currently not possible, because all the workers are killed by the shutdown!() call at the end of the first run. I see two alternatives:
    • introduce start!(Evaluator)/shutdown!(Evaluator) methods that needs to be called immediately before/after method iterations (no-ops for the normal evaluators);
    • implement lazy initialization of the workers in ParallelEvaluator, detect idle periods, hybernate the workers when idle (using wait()), resume on the new fitness evaluation request. Given the state of the parallelism in julia it's not so easy to get it working nicely.

@coveralls
Copy link

Coverage Status

Coverage increased (+3.9%) to 69.692% when pulling 24ea2a8 on alyst:async_parallel_eval into fbdd9ee on robertfeldt:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+4.1%) to 69.978% when pulling 17c2770 on alyst:async_parallel_eval into fbdd9ee on robertfeldt:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+4.1%) to 69.951% when pulling 17c2770 on alyst:async_parallel_eval into fbdd9ee on robertfeldt:master.

@alyst alyst force-pushed the async_parallel_eval branch 2 times, most recently from 1334f07 to e55a8f4 Compare April 22, 2016 09:27
@coveralls
Copy link

Coverage Status

Coverage increased (+4.3%) to 70.156% when pulling e55a8f4 on alyst:async_parallel_eval into bd0745c on robertfeldt:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+4.3%) to 70.156% when pulling e55a8f4 on alyst:async_parallel_eval into bd0745c on robertfeldt:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+4.4%) to 70.253% when pulling 88f9ff1 on alyst:async_parallel_eval into bd0745c on robertfeldt:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+3.6%) to 70.327% when pulling 3fd19d2 on alyst:async_parallel_eval into 4dd9ec2 on robertfeldt:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+3.1%) to 69.808% when pulling 9490577 on alyst:async_parallel_eval into 4dd9ec2 on robertfeldt:master.

@coveralls
Copy link

coveralls commented Sep 22, 2016

Coverage Status

Coverage increased (+3.3%) to 71.416% when pulling b7b714d on alyst:async_parallel_eval into 572e647 on robertfeldt:master.

@coveralls
Copy link

coveralls commented Jul 24, 2017

Coverage Status

Coverage increased (+2.7%) to 71.138% when pulling e9b2a91 on alyst:async_parallel_eval into 91440c1 on robertfeldt:master.

@coveralls
Copy link

coveralls commented Nov 13, 2017

Coverage Status

Coverage decreased (-25.8%) to 69.371% when pulling 8225be4 on alyst:async_parallel_eval into cf37814 on robertfeldt:master.

@robertfeldt
Copy link
Owner

@alyst waht is the status of this branch now? In light of Julia 1.0 soon seems to be coming can we try to unify the different parallellization branches and ideas and merge with master?

@alyst
Copy link
Contributor Author

alyst commented May 22, 2018

I have a rebased version in my staging branch, I will update this one after #83 .

I'm using this branch and it works for me, with some caveats:

  • sometimes during initialization I get ReadOnlyMemory() exceptions.
    Probably it's related to sharing of the arrays by different workers. Once the initialization of the workers succeeds, the optimization itself seems stable. The number of the workers may affect it.
  • If the user interrupts julia (Ctrl+C) when the evaluator is in lock(), the whole julia crashes. Probably it could be fixed by wrapping lock() in try/catch at the expense of some performance loss.

I don't know how much 0.7 improves the situation with the workers. Maybe we can check this branch with 0.7alpha and merge it after making sure that Ctrl+C doesn't crash Julia so easily.

required for putting fitness to/from a (shared) array
Use N workers to asynchronously calculate fitnesses.
Requests for fitness calculation and completion notifications as
well as input parameters and output fitness are
passed via SharedVector/SharedMatrix to minimize serialization overhead.
Parallelized versions of
- update_population_fitness!()
- populate_by_mutants!()
- step!()
looks like using SharedArrays in parallel_evaluator.jl is not enough
worker2job is replaced with busy_workers
- don't use fitness_slots for communication, add dedicated job_done, job_submitted
- add fitness to the archive outside of job_assignments critical section
- convert `@info` into `@debug`
- improve debug message verbosity
also avoid race when reading worker param status
and output when the worker shuts down
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants