diff --git a/previews/PR302/.documenter-siteinfo.json b/previews/PR302/.documenter-siteinfo.json index 77aa7857f..1f761fdaa 100644 --- a/previews/PR302/.documenter-siteinfo.json +++ b/previews/PR302/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-10-21T22:19:18","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-10-22T05:37:47","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/previews/PR302/algorithmic_differentiation/index.html b/previews/PR302/algorithmic_differentiation/index.html index 35b2ef4bb..56e36d121 100644 --- a/previews/PR302/algorithmic_differentiation/index.html +++ b/previews/PR302/algorithmic_differentiation/index.html @@ -19,4 +19,4 @@ D f [x] (\dot{x}) &= [(D \mathcal{l} [g(x)]) \circ (D g [x])](\dot{x}) \nonumber \\ &= \langle \bar{y}, D g [x] (\dot{x}) \rangle \nonumber \\ &= \langle D g [x]^\ast (\bar{y}), \dot{x} \rangle, \nonumber -\end{align}\]
from which we conclude that $D g [x]^\ast (\bar{y})$ is the gradient of the composition $l \circ g$ at $x$.
The consequence is that we can always view the computation performed by reverse-mode AD as computing the gradient of the composition of the function in question and an inner product with the argument to the adjoint.
The above shows that if $\mathcal{Y} = \RR$ and $g$ is the function we wish to compute the gradient of, we can simply set $\bar{y} = 1$ and compute $D g [x]^\ast (\bar{y})$ to obtain the gradient of $g$ at $x$.
This document explains the core mathematical foundations of AD. It explains separately what is does, and how it goes about it. Some basic examples are given which show how these mathematical foundations can be applied to differentiate functions of matrices, and Julia function
s.
Subsequent sections will build on these foundations, to provide a more general explanation of what AD looks like for a Julia programme.
Forwards-mode AD achieves this by breaking down $f$ into the composition $f = f_N \circ \dots \circ f_1$, where each $f_n$ is a simple function whose derivative (function) $D f_n [x_n]$ we know for any given $x_n$. By the chain rule, we have that
\[D f [x] (\dot{x}) = D f_N [x_N] \circ \dots \circ D f_1 [x_1] (\dot{x})\]
which suggests the following algorithm:
When each function $f_n$ maps between Euclidean spaces, the applications of derivatives $D f_n [x_n] (\dot{x}_n)$ are given by $J_n \dot{x}_n$ where $J_n$ is the Jacobian of $f_n$ at $x_n$.
Settings
This document was generated with Documenter.jl version 1.7.0 on Monday 21 October 2024. Using Julia version 1.11.1.