Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for diff of ref return types in rev mode #425

Closed
wants to merge 7 commits into from

Conversation

parth-07
Copy link
Collaborator

@parth-07 parth-07 commented Apr 10, 2022

This PR adds initial support for correctly differentiating function calls with reference return types. The main motivation for adding this support is many of the class operator overloads, such as, =, +=, -=, *= etc, naturally return a reference to the class object.

The work done in this PR develops a base that would be extended upon for adding many other functionalities as well -- most notably, differentiation of function calls with pointer return types and maintaining the correct stack of pointer derivatives when pointers are passed to a call expression.

What does this PR solves?

C++ has the functionality to declare a variable as a reference (an alias) to an already existing variable, object or function.

double a = 11;
double& a_ref = a;

From the mathematical point of view, reference declaration is a no-operation. We are just defining a new name for an already existing variable. What I am trying to emphasize here is, unlike a normal variable declaration, a reference declaration should not have any corresponding reverse pass statements. With that being said, a reference variable declaration does impact the differentiation of further operations.

For example:

someVar += a;
someVar += a_ref;

Statements described in (1) and (2) are mathematically identical. Thus they should produce derived statements that have the same behaviour as well. In Clad, we solve this by effectively using the same derivative variable for both the original variable and the reference variable. To put things more concretely, please consider the following code snippet:

double& a_ref = a;

This statement produces the following statements in the derived function:

// derivative declarations
double _d_a = 0;
double& _d_a_ref = _d_a;

// forward pass
double& a_ref = a;

Please note here that the derivative of a_ref is a reference variable pointing to the derivative of a.

In the example that we just discussed we can easily point _d_a_ref to _d_a because the derivative of a is known at compile time. This is not always the case, for example, consider the following code:

double& someFn(double& i, double&j, double& k) { ... }

double fn(double i, double j, double k) {
  double& ref = someFn(i, j, k);
}

We cannot determine which variable ref is referencing at compile time. Thus, we also cannot determine which derivative should _d_ref refer to.

This PR provides functionality to correctly point _d_ref to the derivative of the variable to which ref refers when this variable is not known at compile time.

How does this PR solves this problem?

This PR solves the problem of correctly setting the derivative of a reference variable when the reference variable is being assigned to the result of a call expression by modifying the primal call expression such that it returns both the primal computation and the adjoint information.

When Clad is differentiating a call to the function someFn that returns a reference then Clad generates a new function someFn_forw by transforming the original function someFn such that it takes adjoint information as input parameters and returns both the primal value and the adjoint information. For the remainder of this discussion, I will refer to this transformation mode as Reverse Mode Forward Pass mode. Please suggest a better and more intuitive name for this transformation mode. For example, consider the following function:

double& someFn(double& i, double& j) {
  double& k = i;
  double& l = j;
  if (...)
    return k;
  else
    return l;
}

The corresponding someFn_forw will be as follows:

clad::ValueAndAdjoint<double&, double&>
someFn_forw(double& i, double& j, clad::array_ref<double> _d_i,
            clad::array_ref<double> _d_j) {
  double* _d_k = nullptr;
  double* _d_l = nullptr;

  // forward pass
  _d_k = &*_d_i;
  double& k = i;

  _d_l = &*_d_j;
  double& l = j;

  if (...)
    return {k, *_d_k};
  else
    return {l, *_d_l};
} 

Therefore, the following statement:

double& ref = someFn(i, j);

will produce the following statements in the derived function:

// derivative declarations
double* _d_ref = nullptr;

// forward pass
double t0 = i;
double t1 = j;
clad::ValueAndAdjoint<double&, double&> t = someFn_forw(i, j, &_d_i, &_d_j);
_d_ref = &t.adjoint;
double& ref = t.value;

// reverse pass
someFn_pullback(t0, t1, /*pullback=*/double(), &_d_i, &_d_j);

Please note that pullback or dfdx() should be zero-tangent when the return type is a reference value. This will be discussed later in the next section.

Problems and design decisions

ReverseModeForwPassVisitor

Currently, I have created a new visitor class ReverseModeForwPassVisitor that inherits from ReverseModeVisitor and is responsible for creating forward pass functions (_forw functions). Please note that the transformation to generate a forward pass function only requires a tiny subset of functionalities of Reverse Mode. To be precise, required functionality of reverse mode are: correctly initialised variable declaration for derivative of each local variable, and forward pass should effectively trace the primal function.

Benefits of inheriting from ReverseModeVisitor:

  • Reverse mode forward pass includes all the functionalities that we require in the Reverse Mode Forward Pass transformed function. Sorry for the confusing terminology here. Thus, if we use the functionality of ReverseModeVisitor, discard the reverse pass and only provide the implementation of Visit* functions that are different in ReverseModeForwPass mode such as VisitReturnStmt then we can avoid too much code duplication. If we don't inherit from ReverseModeVisitor then we would need to provide the implementation for all of the Visit* functions. In most of these functions, we would simply be cloning the AST nodes.

Disadvantages of inheriting from ReverseModeVisitor:

  • If we are reusing the functionality of ReverseModeVisitor, then we would not be able to generate Reverse Mode Forward Pass transformed functions of any function that cannot be differentiated in Reverse Mode, for example, due to the lack of support of some C++ construct. This is problematic because it is a lot easier to generate Reverse Mode Forward Pass transformed functions then it is to perform complete Reverse Mode differentiation, as most of the C++ features and constructs simply need to be cloned in Reverse Mode Forward Pass.
  • Reuse of ReverseModeVisitor implicitly assumes that forward pass of Reverse Mode derived function has effectively the same behaviour as the primal function. In the future, if we add active variable and data flow analysis in Reverse Mode, and we would be adding it soon, then this assumption would not be true.
  • As I earlier mentioned, in the future, we would require _forw functions for adding support of other functionalities as well. Many of these functionalities may require transformations that diverge from the functionalities included in Reverse Mode forward pass, and thus we may need to provide separate implementation of more and more Visit* functions.

Derivative expressions to be used in Reverse Mode forward pass

Until now, we only needed to use derivative variables in the reverse pass. With the introduction of _forw functions, we need to use derivative variables in forward pass too. Since, _forw function takes adjoint information as well.
The problem here is, Visit function is designed to return StmtDiff that consists of a clone of the original expression and corresponding derivative if it exists. Sometimes, the clone contains clad::push(...) expression and derivative contain clad::pop(...) expression. In these cases, the derivative is designed to be used in the reverse pass only -- a notable example being array expressions in loops. Therefore, we need a routine that allows us to conveniently build derivative expressions that can be used in the forward pass. The building of such a routine will be non-trivial because Visit* function is tightly coupled with how derivative expressions are obtained.

Pullback function signature of functions with reference return types

For a function fn:

double fn(double i, double j);

The pullback function should have the signature as follows:

void fn_pullback(double i, double j, double _d_y, clad::array_ref<double> _d_i, clad::array_ref<double> _d_j);

For functions with reference return types, the situation is slightly more complicated.

_d_y or the pullback is used to initialise/correctly set the return value derivative. For example:

return val;

Inside a pullback function, the following code gets differentiated to:

// reverse pass
_d_val += _d_y;

Intuitively, this behaviour can be reasoned as follows:

double someFn(double& i, double& j) {
  ... statement 1 ...
  ... statement 2 ...
  ... and so on ... 
  return val;
}
double fn(double i, double j) {
  ...
  ...
  y = someFn(i, j);
  ...
  ...
}

If the return value of the function someFn is val, then the statement y = someFn(i, j) can effectively be visualised as follows:

double fn(double i, double j) {
  ...
  ...
  ... statement 1 ...
  ... statement 2 ...
  ... and so on ... 
  double y = val;
  ...
  ...
}

Now, if y is a reference variable, then double& y = val becomes a no-operation and should have no corresponding reverse mode derived statement. Therefore, ideally, pullback functions with reference return types do not have any corresponding pullback value. Again, sorry for the confusing terminology. We have two ways to proceed from here:

  • Pullback function signature of functions with reference return types be same as the pullback function signature of functions with the corresponding non-reference return types. In this, a function call to pullback function of the functions with reference return types would need to be called with a dummy pullback (_d_y) value. The dummy value should be equal to zero-tangent vector of the return type.

Or

  • Pullback function signature of functions with reference return types be same as the pullback function signature of functions with void return types. This way, we would not need to pass any dummy value while calling pullback functions. On the downside, this makes rules of pullback function signature slightly more complicated.

Please give your reviews and suggestions on the approach used, the problems and the design decisions discussed here.

@codecov
Copy link

codecov bot commented Apr 10, 2022

Codecov Report

Merging #425 (f2d496e) into master (fac880f) will increase coverage by 0.12%.
The diff coverage is 86.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #425      +/-   ##
==========================================
+ Coverage   91.74%   91.86%   +0.12%     
==========================================
  Files          35       37       +2     
  Lines        5038     5422     +384     
==========================================
+ Hits         4622     4981     +359     
- Misses        416      441      +25     
Impacted Files Coverage Δ
include/clad/Differentiator/DerivativeBuilder.h 100.00% <ø> (ø)
include/clad/Differentiator/ReverseModeVisitor.h 98.68% <ø> (ø)
lib/Differentiator/ReverseModeForwPassVisitor.cpp 82.31% <82.31%> (ø)
lib/Differentiator/ReverseModeVisitor.cpp 95.30% <93.97%> (-0.20%) ⬇️
...e/clad/Differentiator/ReverseModeForwPassVisitor.h 100.00% <100.00%> (ø)
lib/Differentiator/DerivativeBuilder.cpp 100.00% <100.00%> (ø)
lib/Differentiator/CladUtils.cpp 97.68% <0.00%> (-1.02%) ⬇️
lib/Differentiator/EstimationModel.cpp 100.00% <0.00%> (ø)
include/clad/Differentiator/VisitorBase.h 100.00% <0.00%> (ø)
... and 7 more
Impacted Files Coverage Δ
include/clad/Differentiator/DerivativeBuilder.h 100.00% <ø> (ø)
include/clad/Differentiator/ReverseModeVisitor.h 98.68% <ø> (ø)
lib/Differentiator/ReverseModeForwPassVisitor.cpp 82.31% <82.31%> (ø)
lib/Differentiator/ReverseModeVisitor.cpp 95.30% <93.97%> (-0.20%) ⬇️
...e/clad/Differentiator/ReverseModeForwPassVisitor.h 100.00% <100.00%> (ø)
lib/Differentiator/DerivativeBuilder.cpp 100.00% <100.00%> (ø)
lib/Differentiator/CladUtils.cpp 97.68% <0.00%> (-1.02%) ⬇️
lib/Differentiator/EstimationModel.cpp 100.00% <0.00%> (ø)
include/clad/Differentiator/VisitorBase.h 100.00% <0.00%> (ø)
... and 7 more

@parth-07
Copy link
Collaborator Author

The work in this PR was rebased on top of master and merged as part of #601

@parth-07 parth-07 closed this Nov 12, 2023
infinite-void-16 added a commit to infinite-void-16/clad that referenced this pull request Aug 10, 2024
This commit adds support for custom (user-provided) `_forw` functions.
A `_forw` function, if available, is called in place of the actual
function.

For example, if the primal code contains:

```cpp
someFn(u, v, w);
```

and user has defined a custom `_forw` function for `someFn` as follows:

```cpp
namespace clad {
  namespace custom_derivatives {
    void someFn_forw(double u, double v, double w, double *d_u,
      double *d_v, double *dw) {
      // ...
      // ...
    }
  }
}
```

Then clad will generate the derivative function as follows:

```cpp
// forward-pass
clad::custom_derivatives::someFn_forw(u, v, w, d_u, d_v, d_w);
// ...

// reverse-pass; no change in reverse-pass
someFn_pullback(u, v, w, d_u, d_v, d_w);
// ...
```

But more importantly, why do we need such a functionality? Two reasons:

- Supporting reference/pointer return types in the reverse-mode. This
  has been discussed at great length here:
vgvassilev#425 (vgvassilev#425)

- Supporting types whose elements grows dynamically, such as
  `std::vector` and `std::map`. The issue is that we correctly
  need to update the size/property of the adjoint variable when a
  function call updates the size/property of the corresponding primal
  variable.  However, the actual function call does not modify the adjoint
  variable. Here comes `_forw` functions to the rescue. `_forw`
  functions makes it possible to adjust the adjoint variable
  size/properties along with executing the code of the actual function
  call.
infinite-void-16 added a commit to infinite-void-16/clad that referenced this pull request Aug 10, 2024
This commit adds support for custom (user-provided) `_forw` functions.
A `_forw` function, if available, is called in place of the actual
function.

For example, if the primal code contains:

```cpp
someFn(u, v, w);
```

and user has defined a custom `_forw` function for `someFn` as follows:

```cpp
namespace clad {
  namespace custom_derivatives {
    void someFn_forw(double u, double v, double w, double *d_u,
      double *d_v, double *dw) {
      // ...
      // ...
    }
  }
}
```

Then clad will generate the derivative function as follows:

```cpp
// forward-pass
clad::custom_derivatives::someFn_forw(u, v, w, d_u, d_v, d_w);
// ...

// reverse-pass; no change in reverse-pass
someFn_pullback(u, v, w, d_u, d_v, d_w);
// ...
```

But more importantly, why do we need such a functionality? Two reasons:

- Supporting reference/pointer return types in the reverse-mode. This
  has been discussed at great length here:
vgvassilev#425 (vgvassilev#425)

- Supporting types whose elements grows dynamically, such as
  `std::vector` and `std::map`. The issue is that we correctly
  need to update the size/property of the adjoint variable when a
  function call updates the size/property of the corresponding primal
  variable. For example: a call to `vec.push_back(...)` should update
  the size of `_d_vec` as well. However, the actual function call does
  not modify the adjoint variable in any way. Here comes `_forw` functions
  to the rescue. `_forw` functions makes it possible to adjust the adjoint
  variable size/properties along with executing the actual function call.

Please note that `_forw` function signature takes adjoint variables as
arguments and return `clad::ValueAndAdjoint<U, V>` to support the
reference/pointer return type.
infinite-void-16 added a commit to infinite-void-16/clad that referenced this pull request Aug 10, 2024
This commit adds support for custom (user-provided) `_forw` functions.
A `_forw` function, if available, is called in place of the actual
function.

For example, if the primal code contains:

```cpp
someFn(u, v, w);
```

and user has defined a custom `_forw` function for `someFn` as follows:

```cpp
namespace clad {
  namespace custom_derivatives {
    void someFn_forw(double u, double v, double w, double *d_u,
      double *d_v, double *dw) {
      // ...
      // ...
    }
  }
}
```

Then clad will generate the derivative function as follows:

```cpp
// forward-pass
clad::custom_derivatives::someFn_forw(u, v, w, d_u, d_v, d_w);
// ...

// reverse-pass; no change in reverse-pass
someFn_pullback(u, v, w, d_u, d_v, d_w);
// ...
```

But more importantly, why do we need such a functionality? Two reasons:

- Supporting reference/pointer return types in the reverse-mode. This
  has been discussed at great length here:
vgvassilev#425 (vgvassilev#425)

- Supporting types whose elements grows dynamically, such as
  `std::vector` and `std::map`. The issue is that we correctly
  need to update the size/property of the adjoint variable when a
  function call updates the size/property of the corresponding primal
  variable. For example: a call to `vec.push_back(...)` should update
  the size of `_d_vec` as well. However, the actual function call does
  not modify the adjoint variable in any way. Here comes `_forw` functions
  to the rescue. `_forw` functions makes it possible to adjust the adjoint
  variable size/properties along with executing the actual function call.

Please note that `_forw` function signature takes adjoint variables as
arguments and return `clad::ValueAndAdjoint<U, V>` to support the
reference/pointer return type.
infinite-void-16 added a commit to infinite-void-16/clad that referenced this pull request Aug 10, 2024
This commit adds support for custom (user-provided) `_forw` functions.
A `_forw` function, if available, is called in place of the actual
function.

For example, if the primal code contains:

```cpp
someFn(u, v, w);
```

and user has defined a custom `_forw` function for `someFn` as follows:

```cpp
namespace clad {
  namespace custom_derivatives {
    void someFn_forw(double u, double v, double w, double *d_u,
      double *d_v, double *dw) {
      // ...
      // ...
    }
  }
}
```

Then clad will generate the derivative function as follows:

```cpp
// forward-pass
clad::custom_derivatives::someFn_forw(u, v, w, d_u, d_v, d_w);
// ...

// reverse-pass; no change in reverse-pass
someFn_pullback(u, v, w, d_u, d_v, d_w);
// ...
```

But more importantly, why do we need such a functionality? Two reasons:

- Supporting reference/pointer return types in the reverse-mode. This
  has been discussed at great length here:
vgvassilev#425 (vgvassilev#425)

- Supporting types whose elements grows dynamically, such as
  `std::vector` and `std::map`. The issue is that we correctly
  need to update the size/property of the adjoint variable when a
  function call updates the size/property of the corresponding primal
  variable. For example: a call to `vec.push_back(...)` should update
  the size of `_d_vec` as well. However, the actual function call does
  not modify the adjoint variable in any way. Here comes `_forw` functions
  to the rescue. `_forw` functions makes it possible to adjust the adjoint
  variable size/properties along with executing the actual function call.

Please note that `_forw` function signature takes adjoint variables as
arguments and return `clad::ValueAndAdjoint<U, V>` to support the
reference/pointer return type.
infinite-void-16 added a commit to infinite-void-16/clad that referenced this pull request Aug 13, 2024
This commit adds support for custom (user-provided) `_forw` functions.
A `_forw` function, if available, is called in place of the actual
function.

For example, if the primal code contains:

```cpp
someFn(u, v, w);
```

and user has defined a custom `_reverse_forw` function for `someFn` as follows:

```cpp
namespace clad {
  namespace custom_derivatives {
    void someFn_reverse_forw(double u, double v, double w, double *d_u,
      double *d_v, double *dw) {
      // ...
      // ...
    }
  }
}
```

Then clad will generate the derivative function as follows:

```cpp
// forward-pass
clad::custom_derivatives::someFn_reverse_forw(u, v, w, d_u, d_v, d_w);
// ...

// reverse-pass; no change in reverse-pass
someFn_pullback(u, v, w, d_u, d_v, d_w);
// ...
```

But more importantly, why do we need such a functionality? Two reasons:

- Supporting reference/pointer return types in the reverse-mode. This
  has been discussed at great length here:
vgvassilev#425 (vgvassilev#425)

- Supporting types whose elements grows dynamically, such as
  `std::vector` and `std::map`. The issue is that we correctly
  need to update the size/property of the adjoint variable when a
  function call updates the size/property of the corresponding primal
  variable. For example: a call to `vec.push_back(...)` should update
  the size of `_d_vec` as well. However, the actual function call does
  not modify the adjoint variable in any way. Here comes `_forw` functions
  to the rescue. `_forw` functions makes it possible to adjust the adjoint
  variable size/properties along with executing the actual function call.

Please note that `_reverse_forw` function signature takes adjoint variables as
arguments and return `clad::ValueAndAdjoint<U, V>` to support the
reference/pointer return type.
infinite-void-16 added a commit to infinite-void-16/clad that referenced this pull request Aug 19, 2024
This commit adds support for custom (user-provided) `_forw` functions.
A `_forw` function, if available, is called in place of the actual
function.

For example, if the primal code contains:

```cpp
someFn(u, v, w);
```

and user has defined a custom `_reverse_forw` function for `someFn` as follows:

```cpp
namespace clad {
  namespace custom_derivatives {
    void someFn_reverse_forw(double u, double v, double w, double *d_u,
      double *d_v, double *dw) {
      // ...
      // ...
    }
  }
}
```

Then clad will generate the derivative function as follows:

```cpp
// forward-pass
clad::custom_derivatives::someFn_reverse_forw(u, v, w, d_u, d_v, d_w);
// ...

// reverse-pass; no change in reverse-pass
someFn_pullback(u, v, w, d_u, d_v, d_w);
// ...
```

But more importantly, why do we need such a functionality? Two reasons:

- Supporting reference/pointer return types in the reverse-mode. This
  has been discussed at great length here:
vgvassilev#425 (vgvassilev#425)

- Supporting types whose elements grows dynamically, such as
  `std::vector` and `std::map`. The issue is that we correctly
  need to update the size/property of the adjoint variable when a
  function call updates the size/property of the corresponding primal
  variable. For example: a call to `vec.push_back(...)` should update
  the size of `_d_vec` as well. However, the actual function call does
  not modify the adjoint variable in any way. Here comes `_forw` functions
  to the rescue. `_forw` functions makes it possible to adjust the adjoint
  variable size/properties along with executing the actual function call.

Please note that `_reverse_forw` function signature takes adjoint variables as
arguments and return `clad::ValueAndAdjoint<U, V>` to support the
reference/pointer return type.
vgvassilev pushed a commit that referenced this pull request Aug 20, 2024
This commit adds support for custom (user-provided) `_forw` functions.
A `_forw` function, if available, is called in place of the actual
function.

For example, if the primal code contains:

```cpp
someFn(u, v, w);
```

and user has defined a custom `_reverse_forw` function for `someFn` as follows:

```cpp
namespace clad {
  namespace custom_derivatives {
    void someFn_reverse_forw(double u, double v, double w, double *d_u,
      double *d_v, double *dw) {
      // ...
      // ...
    }
  }
}
```

Then clad will generate the derivative function as follows:

```cpp
// forward-pass
clad::custom_derivatives::someFn_reverse_forw(u, v, w, d_u, d_v, d_w);
// ...

// reverse-pass; no change in reverse-pass
someFn_pullback(u, v, w, d_u, d_v, d_w);
// ...
```

But more importantly, why do we need such a functionality? Two reasons:

- Supporting reference/pointer return types in the reverse-mode. This
  has been discussed at great length here:
#425 (#425)

- Supporting types whose elements grows dynamically, such as
  `std::vector` and `std::map`. The issue is that we correctly
  need to update the size/property of the adjoint variable when a
  function call updates the size/property of the corresponding primal
  variable. For example: a call to `vec.push_back(...)` should update
  the size of `_d_vec` as well. However, the actual function call does
  not modify the adjoint variable in any way. Here comes `_forw` functions
  to the rescue. `_forw` functions makes it possible to adjust the adjoint
  variable size/properties along with executing the actual function call.

Please note that `_reverse_forw` function signature takes adjoint variables as
arguments and return `clad::ValueAndAdjoint<U, V>` to support the
reference/pointer return type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant