Add initial support for diff of ref return types in rev mode #425

parth-07 · 2022-04-10T16:17:18Z

This PR adds initial support for correctly differentiating function calls with reference return types. The main motivation for adding this support is many of the class operator overloads, such as, =, +=, -=, *= etc, naturally return a reference to the class object.

The work done in this PR develops a base that would be extended upon for adding many other functionalities as well -- most notably, differentiation of function calls with pointer return types and maintaining the correct stack of pointer derivatives when pointers are passed to a call expression.

What does this PR solves?

C++ has the functionality to declare a variable as a reference (an alias) to an already existing variable, object or function.

double a = 11;
double& a_ref = a;

From the mathematical point of view, reference declaration is a no-operation. We are just defining a new name for an already existing variable. What I am trying to emphasize here is, unlike a normal variable declaration, a reference declaration should not have any corresponding reverse pass statements. With that being said, a reference variable declaration does impact the differentiation of further operations.

For example:

someVar += a;

someVar += a_ref;

Statements described in (1) and (2) are mathematically identical. Thus they should produce derived statements that have the same behaviour as well. In Clad, we solve this by effectively using the same derivative variable for both the original variable and the reference variable. To put things more concretely, please consider the following code snippet:

double& a_ref = a;

This statement produces the following statements in the derived function:

// derivative declarations
double _d_a = 0;
double& _d_a_ref = _d_a;

// forward pass
double& a_ref = a;

Please note here that the derivative of a_ref is a reference variable pointing to the derivative of a.

In the example that we just discussed we can easily point _d_a_ref to _d_a because the derivative of a is known at compile time. This is not always the case, for example, consider the following code:

double& someFn(double& i, double&j, double& k) { ... }

double fn(double i, double j, double k) {
  double& ref = someFn(i, j, k);
}

We cannot determine which variable ref is referencing at compile time. Thus, we also cannot determine which derivative should _d_ref refer to.

This PR provides functionality to correctly point _d_ref to the derivative of the variable to which ref refers when this variable is not known at compile time.

How does this PR solves this problem?

This PR solves the problem of correctly setting the derivative of a reference variable when the reference variable is being assigned to the result of a call expression by modifying the primal call expression such that it returns both the primal computation and the adjoint information.

When Clad is differentiating a call to the function someFn that returns a reference then Clad generates a new function someFn_forw by transforming the original function someFn such that it takes adjoint information as input parameters and returns both the primal value and the adjoint information. For the remainder of this discussion, I will refer to this transformation mode as Reverse Mode Forward Pass mode. Please suggest a better and more intuitive name for this transformation mode. For example, consider the following function:

double& someFn(double& i, double& j) {
  double& k = i;
  double& l = j;
  if (...)
    return k;
  else
    return l;
}

The corresponding someFn_forw will be as follows:

clad::ValueAndAdjoint<double&, double&>
someFn_forw(double& i, double& j, clad::array_ref<double> _d_i,
            clad::array_ref<double> _d_j) {
  double* _d_k = nullptr;
  double* _d_l = nullptr;

  // forward pass
  _d_k = &*_d_i;
  double& k = i;

  _d_l = &*_d_j;
  double& l = j;

  if (...)
    return {k, *_d_k};
  else
    return {l, *_d_l};
}

Therefore, the following statement:

double& ref = someFn(i, j);

will produce the following statements in the derived function:

// derivative declarations
double* _d_ref = nullptr;

// forward pass
double t0 = i;
double t1 = j;
clad::ValueAndAdjoint<double&, double&> t = someFn_forw(i, j, &_d_i, &_d_j);
_d_ref = &t.adjoint;
double& ref = t.value;

// reverse pass
someFn_pullback(t0, t1, /*pullback=*/double(), &_d_i, &_d_j);

Please note that pullback or dfdx() should be zero-tangent when the return type is a reference value. This will be discussed later in the next section.

Problems and design decisions

`ReverseModeForwPassVisitor`

Currently, I have created a new visitor class ReverseModeForwPassVisitor that inherits from ReverseModeVisitor and is responsible for creating forward pass functions (_forw functions). Please note that the transformation to generate a forward pass function only requires a tiny subset of functionalities of Reverse Mode. To be precise, required functionality of reverse mode are: correctly initialised variable declaration for derivative of each local variable, and forward pass should effectively trace the primal function.

Benefits of inheriting from `ReverseModeVisitor`:

Reverse mode forward pass includes all the functionalities that we require in the Reverse Mode Forward Pass transformed function. Sorry for the confusing terminology here. Thus, if we use the functionality of ReverseModeVisitor, discard the reverse pass and only provide the implementation of Visit* functions that are different in ReverseModeForwPass mode such as VisitReturnStmt then we can avoid too much code duplication. If we don't inherit from ReverseModeVisitor then we would need to provide the implementation for all of the Visit* functions. In most of these functions, we would simply be cloning the AST nodes.

Disadvantages of inheriting from `ReverseModeVisitor`:

If we are reusing the functionality of ReverseModeVisitor, then we would not be able to generate Reverse Mode Forward Pass transformed functions of any function that cannot be differentiated in Reverse Mode, for example, due to the lack of support of some C++ construct. This is problematic because it is a lot easier to generate Reverse Mode Forward Pass transformed functions then it is to perform complete Reverse Mode differentiation, as most of the C++ features and constructs simply need to be cloned in Reverse Mode Forward Pass.
Reuse of ReverseModeVisitor implicitly assumes that forward pass of Reverse Mode derived function has effectively the same behaviour as the primal function. In the future, if we add active variable and data flow analysis in Reverse Mode, and we would be adding it soon, then this assumption would not be true.
As I earlier mentioned, in the future, we would require _forw functions for adding support of other functionalities as well. Many of these functionalities may require transformations that diverge from the functionalities included in Reverse Mode forward pass, and thus we may need to provide separate implementation of more and more Visit* functions.

Derivative expressions to be used in Reverse Mode forward pass

Until now, we only needed to use derivative variables in the reverse pass. With the introduction of _forw functions, we need to use derivative variables in forward pass too. Since, _forw function takes adjoint information as well.
The problem here is, Visit function is designed to return StmtDiff that consists of a clone of the original expression and corresponding derivative if it exists. Sometimes, the clone contains clad::push(...) expression and derivative contain clad::pop(...) expression. In these cases, the derivative is designed to be used in the reverse pass only -- a notable example being array expressions in loops. Therefore, we need a routine that allows us to conveniently build derivative expressions that can be used in the forward pass. The building of such a routine will be non-trivial because Visit* function is tightly coupled with how derivative expressions are obtained.

Pullback function signature of functions with reference return types

For a function fn:

double fn(double i, double j);

The pullback function should have the signature as follows:

void fn_pullback(double i, double j, double _d_y, clad::array_ref<double> _d_i, clad::array_ref<double> _d_j);

For functions with reference return types, the situation is slightly more complicated.

_d_y or the pullback is used to initialise/correctly set the return value derivative. For example:

return val;

Inside a pullback function, the following code gets differentiated to:

// reverse pass
_d_val += _d_y;

Intuitively, this behaviour can be reasoned as follows:

double someFn(double& i, double& j) {
  ... statement 1 ...
  ... statement 2 ...
  ... and so on ... 
  return val;
}
double fn(double i, double j) {
  ...
  ...
  y = someFn(i, j);
  ...
  ...
}

If the return value of the function someFn is val, then the statement y = someFn(i, j) can effectively be visualised as follows:

double fn(double i, double j) {
  ...
  ...
  ... statement 1 ...
  ... statement 2 ...
  ... and so on ... 
  double y = val;
  ...
  ...
}

Now, if y is a reference variable, then double& y = val becomes a no-operation and should have no corresponding reverse mode derived statement. Therefore, ideally, pullback functions with reference return types do not have any corresponding pullback value. Again, sorry for the confusing terminology. We have two ways to proceed from here:

Pullback function signature of functions with reference return types be same as the pullback function signature of functions with the corresponding non-reference return types. In this, a function call to pullback function of the functions with reference return types would need to be called with a dummy pullback (_d_y) value. The dummy value should be equal to zero-tangent vector of the return type.

Or

Pullback function signature of functions with reference return types be same as the pullback function signature of functions with void return types. This way, we would not need to pass any dummy value while calling pullback functions. On the downside, this makes rules of pullback function signature slightly more complicated.

Please give your reviews and suggestions on the approach used, the problems and the design decisions discussed here.

codecov · 2022-04-10T16:21:02Z

Codecov Report

Merging #425 (f2d496e) into master (fac880f) will increase coverage by 0.12%.
The diff coverage is 86.50%.

@@            Coverage Diff             @@
##           master     #425      +/-   ##
==========================================
+ Coverage   91.74%   91.86%   +0.12%     
==========================================
  Files          35       37       +2     
  Lines        5038     5422     +384     
==========================================
+ Hits         4622     4981     +359     
- Misses        416      441      +25

Impacted Files	Coverage Δ
include/clad/Differentiator/DerivativeBuilder.h	`100.00% <ø> (ø)`
include/clad/Differentiator/ReverseModeVisitor.h	`98.68% <ø> (ø)`
lib/Differentiator/ReverseModeForwPassVisitor.cpp	`82.31% <82.31%> (ø)`
lib/Differentiator/ReverseModeVisitor.cpp	`95.30% <93.97%> (-0.20%)`	⬇️
...e/clad/Differentiator/ReverseModeForwPassVisitor.h	`100.00% <100.00%> (ø)`
lib/Differentiator/DerivativeBuilder.cpp	`100.00% <100.00%> (ø)`
lib/Differentiator/CladUtils.cpp	`97.68% <0.00%> (-1.02%)`	⬇️
lib/Differentiator/EstimationModel.cpp	`100.00% <0.00%> (ø)`
include/clad/Differentiator/VisitorBase.h	`100.00% <0.00%> (ø)`
... and 7 more

Impacted Files	Coverage Δ
include/clad/Differentiator/DerivativeBuilder.h	`100.00% <ø> (ø)`
include/clad/Differentiator/ReverseModeVisitor.h	`98.68% <ø> (ø)`
lib/Differentiator/ReverseModeForwPassVisitor.cpp	`82.31% <82.31%> (ø)`
lib/Differentiator/ReverseModeVisitor.cpp	`95.30% <93.97%> (-0.20%)`	⬇️
...e/clad/Differentiator/ReverseModeForwPassVisitor.h	`100.00% <100.00%> (ø)`
lib/Differentiator/DerivativeBuilder.cpp	`100.00% <100.00%> (ø)`
lib/Differentiator/CladUtils.cpp	`97.68% <0.00%> (-1.02%)`	⬇️
lib/Differentiator/EstimationModel.cpp	`100.00% <0.00%> (ø)`
include/clad/Differentiator/VisitorBase.h	`100.00% <0.00%> (ø)`
... and 7 more

parth-07 · 2023-11-12T17:20:08Z

The work in this PR was rebased on top of master and merged as part of #601

This commit adds support for custom (user-provided) `_forw` functions. A `_forw` function, if available, is called in place of the actual function. For example, if the primal code contains: ```cpp someFn(u, v, w); ``` and user has defined a custom `_forw` function for `someFn` as follows: ```cpp namespace clad { namespace custom_derivatives { void someFn_forw(double u, double v, double w, double *d_u, double *d_v, double *dw) { // ... // ... } } } ``` Then clad will generate the derivative function as follows: ```cpp // forward-pass clad::custom_derivatives::someFn_forw(u, v, w, d_u, d_v, d_w); // ... // reverse-pass; no change in reverse-pass someFn_pullback(u, v, w, d_u, d_v, d_w); // ... ``` But more importantly, why do we need such a functionality? Two reasons: - Supporting reference/pointer return types in the reverse-mode. This has been discussed at great length here: vgvassilev#425 (vgvassilev#425) - Supporting types whose elements grows dynamically, such as `std::vector` and `std::map`. The issue is that we correctly need to update the size/property of the adjoint variable when a function call updates the size/property of the corresponding primal variable. However, the actual function call does not modify the adjoint variable. Here comes `_forw` functions to the rescue. `_forw` functions makes it possible to adjust the adjoint variable size/properties along with executing the code of the actual function call.

This commit adds support for custom (user-provided) `_forw` functions. A `_forw` function, if available, is called in place of the actual function. For example, if the primal code contains: ```cpp someFn(u, v, w); ``` and user has defined a custom `_forw` function for `someFn` as follows: ```cpp namespace clad { namespace custom_derivatives { void someFn_forw(double u, double v, double w, double *d_u, double *d_v, double *dw) { // ... // ... } } } ``` Then clad will generate the derivative function as follows: ```cpp // forward-pass clad::custom_derivatives::someFn_forw(u, v, w, d_u, d_v, d_w); // ... // reverse-pass; no change in reverse-pass someFn_pullback(u, v, w, d_u, d_v, d_w); // ... ``` But more importantly, why do we need such a functionality? Two reasons: - Supporting reference/pointer return types in the reverse-mode. This has been discussed at great length here: vgvassilev#425 (vgvassilev#425) - Supporting types whose elements grows dynamically, such as `std::vector` and `std::map`. The issue is that we correctly need to update the size/property of the adjoint variable when a function call updates the size/property of the corresponding primal variable. For example: a call to `vec.push_back(...)` should update the size of `_d_vec` as well. However, the actual function call does not modify the adjoint variable in any way. Here comes `_forw` functions to the rescue. `_forw` functions makes it possible to adjust the adjoint variable size/properties along with executing the actual function call. Please note that `_forw` function signature takes adjoint variables as arguments and return `clad::ValueAndAdjoint<U, V>` to support the reference/pointer return type.

This commit adds support for custom (user-provided) `_forw` functions. A `_forw` function, if available, is called in place of the actual function. For example, if the primal code contains: ```cpp someFn(u, v, w); ``` and user has defined a custom `_reverse_forw` function for `someFn` as follows: ```cpp namespace clad { namespace custom_derivatives { void someFn_reverse_forw(double u, double v, double w, double *d_u, double *d_v, double *dw) { // ... // ... } } } ``` Then clad will generate the derivative function as follows: ```cpp // forward-pass clad::custom_derivatives::someFn_reverse_forw(u, v, w, d_u, d_v, d_w); // ... // reverse-pass; no change in reverse-pass someFn_pullback(u, v, w, d_u, d_v, d_w); // ... ``` But more importantly, why do we need such a functionality? Two reasons: - Supporting reference/pointer return types in the reverse-mode. This has been discussed at great length here: vgvassilev#425 (vgvassilev#425) - Supporting types whose elements grows dynamically, such as `std::vector` and `std::map`. The issue is that we correctly need to update the size/property of the adjoint variable when a function call updates the size/property of the corresponding primal variable. For example: a call to `vec.push_back(...)` should update the size of `_d_vec` as well. However, the actual function call does not modify the adjoint variable in any way. Here comes `_forw` functions to the rescue. `_forw` functions makes it possible to adjust the adjoint variable size/properties along with executing the actual function call. Please note that `_reverse_forw` function signature takes adjoint variables as arguments and return `clad::ValueAndAdjoint<U, V>` to support the reference/pointer return type.

This commit adds support for custom (user-provided) `_forw` functions. A `_forw` function, if available, is called in place of the actual function. For example, if the primal code contains: ```cpp someFn(u, v, w); ``` and user has defined a custom `_reverse_forw` function for `someFn` as follows: ```cpp namespace clad { namespace custom_derivatives { void someFn_reverse_forw(double u, double v, double w, double *d_u, double *d_v, double *dw) { // ... // ... } } } ``` Then clad will generate the derivative function as follows: ```cpp // forward-pass clad::custom_derivatives::someFn_reverse_forw(u, v, w, d_u, d_v, d_w); // ... // reverse-pass; no change in reverse-pass someFn_pullback(u, v, w, d_u, d_v, d_w); // ... ``` But more importantly, why do we need such a functionality? Two reasons: - Supporting reference/pointer return types in the reverse-mode. This has been discussed at great length here: #425 (#425) - Supporting types whose elements grows dynamically, such as `std::vector` and `std::map`. The issue is that we correctly need to update the size/property of the adjoint variable when a function call updates the size/property of the corresponding primal variable. For example: a call to `vec.push_back(...)` should update the size of `_d_vec` as well. However, the actual function call does not modify the adjoint variable in any way. Here comes `_forw` functions to the rescue. `_forw` functions makes it possible to adjust the adjoint variable size/properties along with executing the actual function call. Please note that `_reverse_forw` function signature takes adjoint variables as arguments and return `clad::ValueAndAdjoint<U, V>` to support the reference/pointer return type.

Add initial support for diff of ref return types in rev mode

9468911

parth-07 added 4 commits April 20, 2022 23:56

Kinda workign

d53b0b6

In progress

f266be5

Yayy! working

8f6aa67

Fix tests

74904d8

parth-07 force-pushed the rev-ref-returns branch from cc098b2 to 74904d8 Compare April 27, 2022 14:34

parth-07 added 2 commits April 29, 2022 08:51

base diff

1d9d1db

add basic test

f2d496e

parth-07 mentioned this pull request Mar 10, 2023

Error during compilation of a code trying to calculate the jacobian of a function with many function calls #533

Open

This was referenced Jul 20, 2023

Rev ref returns #601

Merged

test failures in current master branch #606

Closed

parth-07 closed this Nov 12, 2023

parth-07 mentioned this pull request Aug 3, 2024

Extend custom derivatives functionality to facilitate custom _forw functions. #1024

Closed

infinite-void-16 mentioned this pull request Aug 10, 2024

Add support of custom _forw functions #1037

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial support for diff of ref return types in rev mode #425

Add initial support for diff of ref return types in rev mode #425

parth-07 commented Apr 10, 2022 •

edited

Loading

codecov bot commented Apr 10, 2022 •

edited

Loading

parth-07 commented Nov 12, 2023

Add initial support for diff of ref return types in rev mode #425

Add initial support for diff of ref return types in rev mode #425

Conversation

parth-07 commented Apr 10, 2022 • edited Loading

What does this PR solves?

How does this PR solves this problem?

Problems and design decisions

ReverseModeForwPassVisitor

Benefits of inheriting from ReverseModeVisitor:

Disadvantages of inheriting from ReverseModeVisitor:

Derivative expressions to be used in Reverse Mode forward pass

Pullback function signature of functions with reference return types

codecov bot commented Apr 10, 2022 • edited Loading

Codecov Report

parth-07 commented Nov 12, 2023

parth-07 commented Apr 10, 2022 •

edited

Loading

`ReverseModeForwPassVisitor`

Benefits of inheriting from `ReverseModeVisitor`:

Disadvantages of inheriting from `ReverseModeVisitor`:

codecov bot commented Apr 10, 2022 •

edited

Loading