Skip to content

Commit

Permalink
design proposal for top-level ask/tell interface
Browse files Browse the repository at this point in the history
  • Loading branch information
robertfeldt committed May 20, 2018
1 parent 2d40677 commit 8b941ba
Showing 1 changed file with 68 additions and 0 deletions.
68 changes: 68 additions & 0 deletions design/design_by_testcode/test_toplevel_ask_tell_interface.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
using BlackBoxOptim: fitness, index, tag, rank_by_fitness!, bbsetup
using BlackBoxOptim: hassolution, setsolution!, solution

# The ask tell interface aims to allow flexibility in cases where
# the evaluator might not be easy/possible to define up-front
# or when the optimization needs to be "driven" from the outside
# rather than from BBO itself.
@testset "Top-level ask/tell interface" begin
@testset "Single float fitness, with intermediate solutions" begin

This comment has been minimized.

Copy link
@robertfeldt

robertfeldt May 20, 2018

Author Owner

@alyst please check the desing proposal below for a top-level ask/tell interface. This might be useful in cases where the user wants to "drive" the optimization rather than "leave over" to BBO itself. I have noted some questions below, please consider, thanks.


# A typical use case for this is that we need to map parameter sets to
# intermediate "solutions" before we can evaluate their fitness. We want
# to avoid the repeated generation of solutions since that might be costly.
# In this example it is not but in general it might be, and then we want
# to "cache" the intermediate solutions with the parameters.
params2solution(params) = string(sum(params))
fitn(sol) = length(sol)

# Since the user will step through the ask/tell cycle themselves we should
# not require knowledge of the optimization function / evaluator. For cases
# where the fitness is not a single Float64 one should state the fitness
# scheme explicitly as for the multi-objective case?
oc = bbsetup(; SearchRange = (-1.0, 1.0), NumDimensions = 5)

# When we ask for solutions we get either solutions for params
# we previously supplied or empty Nullables. The alternative would
# be to embed the solutions in the Candidate struct and let user
# have getters/setters for them. Seems unnecessary to add this
# level of complexity. Easier to have a general top-level interface
# where we can get and set solutions/phenotypes associated with each
# set of parameters. OTOH we will probably save the solutions in the
# candidates anyway so users that require a more control can just ask
# to get the candidates themselves and then return them (sorted).
# Here we show the simpler, first version. It will unpack the params
# and the solutions from the (internal) candidates and thus does not
# require the user to know anything about the Candidate type or internals
# of BBO.
params, solutions = ask(oc; withSolutions = true)

@test length(params) > 0
@test length(params) == length(solutions)

# In this case there are no solutions yet but in general there might be.
# Our task is now to sort the params and solutions based on their fitness.
fs = Array{Float64}()
for i in eachindex(params)
p = params[i]
s = solutions[i]
@test typeof(p) <: Vector{Float64}
@test typeof(s) <: Nullable
@test fitness(c) == NaN # Since not yet evaluated (and since we did not specify a non-standard fitness scheme).
@test isnull(s) # Since not yet evaluated. If one of the params is one for which we have previously supplied the solution is should be returned (in a Nullable) instead.

ns = isnull(s) ? params2solution(p) : s # Don't calc new solution is already given before.
push!(newsolutions, ns)
push!(fs, fitn(ns))
end

# A usability risk here is that the user only sorts one of the arrays
# returned? Another risk is that the fitnesses might not be comparable on
# a global scale, i.e. between calls to ask/tell. This is likely to mess up
# tracing and archives as well as the saving of the "best" individuals. OTOH,
# since optimization is "driven" from the outside maybe the resposibility for
# keeping history, tracing etc should be on the user rather than on BBO?
p = sortperm(fs)
tell!(oc, params[p], fs[p], newsolutions[p])
end
end

3 comments on commit 8b941ba

@alyst
Copy link
Contributor

@alyst alyst commented on 8b941ba May 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Just to clarify that I got the things right:

  1. Between the "params" (aka "genotype") and the fitness we introduce the "solution" (aka "phenotype"). It's actually the solution to some embedded problem, and therefore needs to be cached alongside each individual genotype in the population/archive.
  2. At the moment any AskTellOptimizer calls rank_by_fitness!(evaluator, candidates), which determines what candidates would be kept/thrown by tell!(). The idea is to allow the users to specify their own ranking function, given that fitness(phenotype/genotype) might be costly/unknown, while there could be a less expensive ordering (total/partial?) defined in the "phenotype" space. In this scheme the fitness function is actually not used since the ranking of the candidates (for ask/tell step and for the archive) would be done through that ordering.

It should be quite easy to parameterize AskTellOptimizer with the callback function that does the ranking. To avoid misuses on the user side, maybe it should return the permutation of the candidates instead of doing the permutation.

Actually, since in this optimization scheme fitness is not used at all, maybe the least invasive way of adding "phenotype" support would be to use F (the fitness type) as the phenotype type? The real fitness (if needed) could be just an additional optional field of F, managed by user-defined fitness() function. We would just need to make sure that BBO supports non-bitstype F, e.g. FitPopulation.fitness::Vector{F} field that caches the fitness of the population needs to be replaced by Union{F, Nothing} (AFAIK Nullable is deprecated in favor of Union{T, Nothing} approach, which gets special support by the compiler in 0.7). Evaluators are already taught not to reevaluate the fitness (see Candidate{F} -- the envelope for the FitPopulation individuals).

@robertfeldt
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alyst.

Yes, for your point 1 it is exactly the genotype/phenotype difference I have in mind (from Evo Algs) but wanted to avoid an EA-specific terminology and thus params/solutions. In general, we want to cache them since the process involved in going from params to solutions might be costly.

For 2 it is not only that calculating fitness might be costly/unknown it could also be relative, i.e. changing dynamically. This latter case is actually my use case: each time fitness is evaluated the whole fitness function changes and needs to since we are searching for "coverage" of a space so already covered areas now need to get a worse fitness. But yes, we want a general solution that can handle all of the unknown/costly/relative/dynamic fitness function cases with, preferably, a single API.

I like returning the permutation to tell but this does not allow sending back the solutions (or updated fitness values) for caching. Can you clarify what you mean there? Do you mean tell!(oc, permorder, fitnesses) and embedding the solutions in the type of the fitness values?

I like the general idea of embedding the solutions/phenotypes in the Fitness type but the downside I see is that the fitness function in general then would take candidate values rather than parameters/genotypes. This makes the API more complex for the more common use cases of just optimizing a function from a Float vector to a FLoat fitness value. I guess we could "probe" the fitness function to determine if it can handle Candidate types but this would take one evaluation (or if we can ask for the arity of a function if it has two args the first one is the params/genotype and the second one is the fitness value). Maybe it is better to have a parameter that has to be set when the fitness function takes candidates/individual (of type Candidate{F} with F potentially being a struct which has both a trad fitness value and a solution/phenotype). But with this latter solution my use case of dynamically changing fitness can actually be handled inside the fitness function. Hmm, so it seems we have multiple solutions for multiple separate issues. I'm trying to summarize issues and solutions below:

  • I1: Driving optimization from the outside
  • I2: Partial ordering of set of candidates, but no globally consistent/static fitness function
  • I3: Caching of solutions/phenostypes with their params/genotypes. Since getting the former from the latter might be costly.

Solutions:

  • S1: Lift step!. "Driving from the outside" can actually already be done by taking step!s on the OptRunController. We can just lift and expose it on the OptController so the user does not need to know about also the OptRunController. Addresses I1 but not I2 and I3.
  • S2: ask/tell interface without exposing solutions. Addresses I1 and I2 but not I3. But unclear how to handle StepOptimizers (maybe we should try to turn them into AskTellOptimizer anyway?)
  • S3: ask/tell where one can ask for solutions with param to ask. Addresses all 3 above but user might misuse if forgetting to sort all args returned to tell.
  • S4: Like S3 but arg to tell! is the perm order from better to worse fitness. Addresses all 3 and reduces likelihood of user misuse but still unclear if also need to supply fitnesses (should they be ordered according to permorder or to orig order?).
  • S5: Cache solutions/phenotypes in fitness type. User provides a fitness scheme that holds both the fitness value and the solutions. Only calculates (and caches) a new solution if none is already in the "old" fitness supplied to the fitness function called from BBO. Combine with S1 to solve all 3 issues. Dynamic case is handled by re-calculating fitness (but not the solution) if current version is "stale". We can provide a FitnessWithSolution{F, S} struct to capture most common use case of caching a solution/phenotype with the fitness.

There are probably other solutions, for example providing special Evaluators. What did I forget? What is your preference? I want something that is easy for users so don't want to expose internals/types too much unless really necessary. Maybe a combo of S1 and S5 above is best and more inline with current design? It is a bit strange to supply the params and an "old" fitness to the function being optimized though, in case it needs to be updated.

@alyst
Copy link
Contributor

@alyst alyst commented on 8b941ba May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current vision is something like S5 + user-specified ranking function, see below.

I like returning the permutation to tell but this does not allow sending back the solutions (or updated fitness values) for caching. Can you clarify what you mean there? Do you mean tell!(oc, permorder, fitnesses) and embedding the solutions in the type of the fitness values?

This is how step!() looks now

function step!(ctrl::OptRunController{<:AskTellOptimizer})
    # The ask()/tell() interface is more general since you can mix and max
    # elements from several optimizers using it. However, in this top-level
    # execution function we do not make use of this flexibility...
    candidates = ask(ctrl.optimizer)
    rank_by_fitness!(ctrl.evaluator, candidates)
    return tell!(ctrl.optimizer, candidates)
end

We can add AskTellOptimizer.rank! field, where rank! is a function: rank!(rankperm::AbstractVector{Int}, evaluator, candidates::AbstractVector{Candidate}).
It returns the ranking permutation in rankperm (rank!(rankperm, ...) rather than rankperm = rank(...) to avoid allocation, which may generate overhead for simple problems).
rank!() may also calculate the fitness and cache it in candidates, if required for ranking.
rank! defaults to the current rank_by_fitness!() for the typical usecases.
ask!() and tell!() stay as before; that would help to decouple the optimizer algorithm logic from the ranking logic.

We can also hide the candidates and evaluator behind candidates::CandidateContainer with getindex(ix), getfitness(ix), updatefitness!(ix) methods.

I like the general idea of embedding the solutions/phenotypes in the Fitness type but the downside I see is that the fitness function in general then would take candidate values rather than parameters/genotypes.

I think we can still leave the fitness method API as fitness(x::AbstractVector, p::OptimizationProblem)::Fitness (fitness caching in the Candidate is managed by Evaluator, so we don't have to expose it).
For typical problems the fitness type will stay as is. But for your usecase it could be a struct, e.g.:

struct SolFitness
   solution::String
   fitness::Union{Float64, Nothing}
end

So SolFitness would store the solution, and the real fitness would be optional.
Probably we would have to extend fitness() method, so that we can provide the cached solution, current fitness landscape state (for dynamical problems) and specify whether we need to calculate the real fitness:

fitness(x::AbstractVector, p::OptimizationProblem, old::Fitness; kwargs....)

The extended method would only be needed for non-standard rank!() (if we'll have CandidateContainer, it could be updatefitness!(container, ix, oldfitness; kwargs...)).

Evaluator is already handling caching of the fitness in the Candidate, so this is not something that the user would have to manage.
Evaluator also distributes fitness calculation to parallel workers, so we have to avoid that fitness() is called directly (e.g. inside rank!()).

It is a bit strange to supply the params and an "old" fitness to the function being optimized though, in case it needs to be updated.

Agreed. But we can provide a higher-level framework for solution-based problems with extended API on top of what we already have, e.g.
define 'SolutionFitness{S, F}. The user would have to define solution(params::AbstractVector, p::OptimizationProblem{<:SolutionFitness})andfitness(params::AbstractVector, solution::S, p::OptimizationProblem), whereas lower-level fitness(params, p, oldfitness::SolutionFitness)would be defined in BBO, would call the user-defined methods as needed and compose the results intoSolutionFitness` result.

What did I forget?

How will the archive be updated? Would it be fine to just compare the solutions from the archive with the current candidate solutions, or some recalculation (e.g. archive solutions) would be needed?
Should we introduce some dynamic fitness landscape state that would have to be passed into fitness()?

Ideally, it would also be nice if ask/rank!/tell! scheme supports asynchronous evaluation (as I've implemented for Borg in #46),
so that we can request the Evaluator to calculate the fitness of the candidates provided by ask() and, without waiting for the result, rank the canidates
that were already done (probably asked at previous iterations).

Please sign in to comment.