Skip to content

Commit

Permalink
Add to docs and rename fields
Browse files Browse the repository at this point in the history
  • Loading branch information
Vaibhavdixit02 committed Jul 11, 2023
1 parent ea45267 commit cd410a1
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 62 deletions.
110 changes: 67 additions & 43 deletions docs/src/optimization_packages/optimisers.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# [Optimisers.jl](@id optimisers)

## Installation: OptimizationFlux.jl
## Installation: OptimizationOptimisers.jl

To use this package, install the OptimizationOptimisers package:

Expand All @@ -9,142 +9,166 @@ import Pkg;
Pkg.add("OptimizationOptimisers");
```

In addition to the optimisation algorithms provided by the Optimisers.jl package this subpackage
also provides the Sophia optimisation algorithm.


## Local Unconstrained Optimizers

- Sophia: Based on the recent paper https://arxiv.org/abs/2305.14342. It incorporates second order information
in the form of the diagonal of the Hessian matrix hence avoiding the need to compute the complete hessian. It has been shown to converge faster than other first order methods such as Adam and SGD.

+ `solve(problem, Sophia(; η, βs, ϵ, λ, k, ρ))`

+ `η` is the learning rate
+ `βs` are the decay of momentums
+ `ϵ` is the epsilon value
+ `λ` is the weight decay parameter
+ `k` is the number of iterations to re-compute the diagonal of the Hessian matrix
+ `ρ` is the momentum
+ Defaults:

* `η = 0.001`
* `βs = (0.9, 0.999)`
* `ϵ = 1e-8`
* `λ = 0.1`
* `k = 10`
* `ρ = 0.04`

- [`Optimisers.Descent`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Descent): **Classic gradient descent optimizer with learning rate**

+ `solve(problem, Descent(η))`

+ `η` is the learning rate
+ Defaults:

* `η = 0.1`

- [`Optimisers.Momentum`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Momentum): **Classic gradient descent optimizer with learning rate and momentum**

+ `solve(problem, Momentum(η, ρ))`

+ `η` is the learning rate
+ `ρ` is the momentum
+ Defaults:

* `η = 0.01`
* `ρ = 0.9`
- [`Optimisers.Nesterov`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Nesterov): **Gradient descent optimizer with learning rate and Nesterov momentum**

+ `solve(problem, Nesterov(η, ρ))`

+ `η` is the learning rate
+ `ρ` is the Nesterov momentum
+ Defaults:

* `η = 0.01`
* `ρ = 0.9`
- [`Optimisers.RMSProp`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.RMSProp): **RMSProp optimizer**

+ `solve(problem, RMSProp(η, ρ))`

+ `η` is the learning rate
+ `ρ` is the momentum
+ Defaults:

* `η = 0.001`
* `ρ = 0.9`
- [`Optimisers.Adam`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.Adam): **Adam optimizer**

+ `solve(problem, Adam(η, β::Tuple))`

+ `η` is the learning rate
+ `β::Tuple` is the decay of momentums
+ Defaults:

* `η = 0.001`
* `β::Tuple = (0.9, 0.999)`
- [`Optimisers.RAdam`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.RAdam): **Rectified Adam optimizer**

+ `solve(problem, RAdam(η, β::Tuple))`

+ `η` is the learning rate
+ `β::Tuple` is the decay of momentums
+ Defaults:

* `η = 0.001`
* `β::Tuple = (0.9, 0.999)`
- [`Optimisers.RAdam`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.OAdam): **Optimistic Adam optimizer**

+ `solve(problem, OAdam(η, β::Tuple))`

+ `η` is the learning rate
+ `β::Tuple` is the decay of momentums
+ Defaults:

* `η = 0.001`
* `β::Tuple = (0.5, 0.999)`
- [`Optimisers.AdaMax`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.AdaMax): **AdaMax optimizer**

+ `solve(problem, AdaMax(η, β::Tuple))`

+ `η` is the learning rate
+ `β::Tuple` is the decay of momentums
+ Defaults:

* `η = 0.001`
* `β::Tuple = (0.9, 0.999)`
- [`Optimisers.ADAGrad`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.ADAGrad): **ADAGrad optimizer**

+ `solve(problem, ADAGrad(η))`

+ `η` is the learning rate
+ Defaults:

* `η = 0.1`
- [`Optimisers.ADADelta`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.ADADelta): **ADADelta optimizer**

+ `solve(problem, ADADelta(ρ))`

+ `ρ` is the gradient decay factor
+ Defaults:

* `ρ = 0.9`
- [`Optimisers.AMSGrad`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.ADAGrad): **AMSGrad optimizer**

+ `solve(problem, AMSGrad(η, β::Tuple))`

+ `η` is the learning rate
+ `β::Tuple` is the decay of momentums
+ Defaults:

* `η = 0.001`
* `β::Tuple = (0.9, 0.999)`
- [`Optimisers.NAdam`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.NAdam): **Nesterov variant of the Adam optimizer**

+ `solve(problem, NAdam(η, β::Tuple))`

+ `η` is the learning rate
+ `β::Tuple` is the decay of momentums
+ Defaults:

* `η = 0.001`
* `β::Tuple = (0.9, 0.999)`
- [`Optimisers.AdamW`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.AdamW): **AdamW optimizer**

+ `solve(problem, AdamW(η, β::Tuple))`

+ `η` is the learning rate
+ `β::Tuple` is the decay of momentums
+ `decay` is the decay to weights
+ Defaults:

* `η = 0.001`
* `β::Tuple = (0.9, 0.999)`
* `decay = 0`
- [`Optimisers.ADABelief`](https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.ADABelief): **ADABelief variant of Adam**

+ `solve(problem, ADABelief(η, β::Tuple))`

+ `η` is the learning rate
+ `β::Tuple` is the decay of momentums
+ Defaults:

* `η = 0.001`
* `β::Tuple = (0.9, 0.999)`
34 changes: 17 additions & 17 deletions lib/OptimizationOptimisers/src/sophia.jl
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
using Optimization.LinearAlgebra

struct Sophia
lr::Float64
betas::Tuple{Float64, Float64}
eps::Float64
weight_decay::Float64
η::Float64
βs::Tuple{Float64, Float64}
ϵ::Float64
λ::Float64
k::Integer
rho::Float64
ρ::Float64
end

SciMLBase.supports_opt_cache_interface(opt::Sophia) = true

Check warning on line 12 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L12

Added line #L12 was not covered by tests

function Sophia(; lr = 1e-3, betas = (0.9, 0.999), eps = 1e-8, weight_decay = 1e-1, k = 10,
rho = 0.04)
Sophia(lr, betas, eps, weight_decay, k, rho)
function Sophia(; η = 1e-3, βs = (0.9, 0.999), ϵ = 1e-8, λ = 1e-1, k = 10,

Check warning on line 14 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L14

Added line #L14 was not covered by tests
ρ = 0.04)
Sophia(η, βs, ϵ, λ, k, ρ)

Check warning on line 16 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L16

Added line #L16 was not covered by tests
end

clip(z, ρ) = max(min(z, ρ), -ρ)

Check warning on line 19 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L19

Added line #L19 was not covered by tests
Expand Down Expand Up @@ -54,11 +54,11 @@ function SciMLBase.__solve(cache::OptimizationCache{
}
local x, cur, state
uType = eltype(cache.u0)
lr = uType(cache.opt.lr)
betas = uType.(cache.opt.betas)
eps = uType(cache.opt.eps)
weight_decay = uType(cache.opt.weight_decay)
rho = uType(cache.opt.rho)
η = uType(cache.opt.η)
βs = uType.(cache.opt.βs)
ϵ = uType(cache.opt.ϵ)
λ = uType(cache.opt.λ)
ρ = uType(cache.opt.ρ)

Check warning on line 61 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L55-L61

Added lines #L55 - L61 were not covered by tests

if cache.data != Optimization.DEFAULT_DATA
maxiters = length(cache.data)
Expand Down Expand Up @@ -97,17 +97,17 @@ function SciMLBase.__solve(cache::OptimizationCache{
elseif cb_call
break

Check warning on line 98 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L86-L98

Added lines #L86 - L98 were not covered by tests
end
mₜ = betas[1] .* mₜ + (1 - betas[1]) .* gₜ
mₜ = βs[1] .* mₜ + (1 - βs[1]) .* gₜ

Check warning on line 100 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L100

Added line #L100 was not covered by tests

if i % cache.opt.k == 1
hₜ₋₁ = copy(hₜ)
u = randn(uType, length(θ))
f.hv(hₜ, θ, u, d...)
hₜ = betas[2] .* hₜ₋₁ + (1 - betas[2]) .* (u .* hₜ)
hₜ = βs[2] .* hₜ₋₁ + (1 - βs[2]) .* (u .* hₜ)

Check warning on line 106 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L102-L106

Added lines #L102 - L106 were not covered by tests
end
θ = θ .- lr * weight_decay .* θ
θ = θ .- η * λ .* θ
θ = θ .-

Check warning on line 109 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L108-L109

Added lines #L108 - L109 were not covered by tests
lr .* clip.(mₜ ./ max.(hₜ, Ref(eps)), Ref(rho))
η .* clip.(mₜ ./ max.(hₜ, Ref(ϵ)), Ref(ρ))
end

Check warning on line 111 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L111

Added line #L111 was not covered by tests

return SciMLBase.build_solution(cache, cache.opt,

Check warning on line 113 in lib/OptimizationOptimisers/src/sophia.jl

View check run for this annotation

Codecov / codecov/patch

lib/OptimizationOptimisers/src/sophia.jl#L113

Added line #L113 was not covered by tests
Expand Down
4 changes: 2 additions & 2 deletions lib/OptimizationOptimisers/test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ using Zygote
prob = OptimizationProblem(optprob, x0, _p)

sol = Optimization.solve(prob,
OptimizationOptimisers.Sophia(; lr = 0.5,
weight_decay = 0.0),
OptimizationOptimisers.Sophia(; η = 0.5,
λ = 0.0),
maxiters = 1000)
@test 10 * sol.objective < l1

Expand Down

0 comments on commit cd410a1

Please sign in to comment.