Accel/expval #481

vincentmr · 2023-08-23T15:20:28Z

Before submitting

Please complete the following checklist when submitting a PR:

All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to the
tests directory!
All new functions and code must be clearly commented and documented.
If you do make documentation changes, make sure that the docs build and
render correctly by running make docs.
Ensure that the test suite passes, by running make test.
Add a new entry to the .github/CHANGELOG.md file, summarizing the
change, and including a link back to the PR.
Ensure that code is properly formatted by running make format.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.

Context:
In the LKokkos backend, there are two ways to implement expectation values:

Copy the statevector, apply the observable to the copy, compute the expectation value with a BLAS-like inner product.
Accumulate the expectation value on-the-fly, applying the observable to a portion of the statevector.

The first method is currently in use, but is wasteful in a couple ways:

The statevector copy can require a lot of memory, which means running out with one or two fewer qubits depending on the system.
Computational efficiency is asymptotically memory-bound and reduce operations can achieve roughly twice the bandwidth of for loop operations on Nvidia devices, for example, which favors on-the-fly implementations.

A simple benchmarking script like the one below yields, for 29 qubits (on Perlmutter's A100)

0.330 sec.
0.012 sec.

i.e. a 27.5x speed-up.

import pennylane as qml
import time

n_wires = 29
n_repeat = 100

dev = qml.device("lightning.kokkos", wires=n_wires)

@qml.qnode(dev)
def circuit():
    return qml.expval(qml.PauliZ(n_wires//2))

t0 = time.time()
for _ in range(n_repeat):
    circuit()
dt = (time.time() - t0)/n_repeat
print(f"{dt}")

Description of the Change:

Benefits:

Possible Drawbacks:

Related GitHub Issues:

…calls into two templated methods. Call specialized expval methods when possible. Remove obsolete 'Apply directly' tests.

codecov · 2023-08-23T20:21:46Z

Codecov Report

❗ No coverage uploaded for pull request base (bugfix/cuda12@6dc7883). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             bugfix/cuda12     #481   +/-   ##
================================================
  Coverage                 ?   96.84%           
================================================
  Files                    ?      142           
  Lines                    ?    16275           
  Branches                 ?        0           
================================================
  Hits                     ?    15761           
  Misses                   ?      514           
  Partials                 ?        0

…alueMultiQubitOpFunctor.

AmintorDusko

Amazing work. Do we know if these changes will allow us to reach a larger number of qubits?
If yes, what are the numbers we have?

vincentmr · 2023-08-24T15:28:19Z

Amazing work. Do we know if these changes will allow us to reach a larger number of qubits? If yes, what are the numbers we have?

From what I could see on Perlmutter, we can reach 30 instead of 29 on a A100 card. The main thing is that 1- and 2-qubit expvals are much faster.

vincentmr · 2023-08-24T18:47:53Z

In support of this PR, I made a gist that benchmarks expval on a CPU (OMP_NUM_THREADS=64) and a GPU (A100). The results are generally faster, especially the 1- and 2-qubit operators above 20 qubits. In the following figures, inner and team stand for the inner-product-based and TeamPolicy (this PR) algorithms respectively; first stands for targeting low-index qubits (only first shown since timings are similar across first, mid and last targets); 1, 2, 3 stand for the number of wires targeted by the Hermitian unitary. The timings for 1-qubit Hermitian unitaries are similar to that of named gates (e.g. PauliZ) with an equal number of wires.

vincentmr · 2023-08-24T21:15:00Z

Bonus with LKokkos benchmarks run on LUMI's AMD cards

AmintorDusko

Nothing more to ask. You did a great job here!

mlxd

💯

.

…nctor to compute multi-qubit expval.

mlxd

Happy with the revision. Thanks @vincentmr

* M pennylane_lightning/core/src/bindings/Bindings.hpp; hack `JacobianData` to work with devices. M pennylane_lightning/core/src/simulators/lightning_kokkos/StateVectorKokkos.hpp; `applyMatrix` bugfix: use intermediate hostview to copy matrix data; same bugfix for `getDataVector`. M pennylane_lightning/core/src/simulators/lightning_kokkos/algorithms/AdjointJacobianKokkos.hpp; use copy constructor. M pennylane_lightning/core/src/simulators/lightning_kokkos/measurements/MeasurementsKokkos.hpp; use copy constructor. M pennylane_lightning/core/src/simulators/lightning_kokkos/observables/ObservablesKokkos.hpp; use copy constructor. M requirements-dev.txt; add clang-format-14. * Auto update version * Update changelog. * Auto update version * Auto update version * Add an argument to adjointJacobian to avoid syncing and copying state vector data in adjoint-diff. * Reformat * trigger CI * [skip ci] Update changelog. * Auto update version * Auto update version * Accel/expval (#481) * Introduce std::unordered_map<std::string, ExpValFunc> expval_funcs_. * Introduce applyExpectationValueFunctor. * Add binding to LKokkos expval(matrix, wires). Combine expval functor calls into two templated methods. Call specialized expval methods when possible. Remove obsolete 'Apply directly' tests. * Update changelog. * Add test for arbitrary expval(Hermitian). * Add getExpectationValueMultiQubitOpFunctor. * Add typename hint for macos. * Add typename macos. * Use Kokkos::ThreadVectorRange policy for innerloop in getExpectationValueMultiQubitOpFunctor. * Couple fix for HIP. * Use inner product scheme instead of getExpectationValueMultiQubitOpFunctor to compute multi-qubit expval. --------- Co-authored-by: Dev version update bot <github-actions[bot]@users.noreply.github.com> Co-authored-by: Amintor Dusko <[email protected]>

vincentmr added 3 commits August 23, 2023 10:45

Introduce std::unordered_map<std::string, ExpValFunc> expval_funcs_.

c45cd23

Introduce applyExpectationValueFunctor.

33ff620

Add binding to LKokkos expval(matrix, wires). Combine expval functor …

e0d3212

…calls into two templated methods. Call specialized expval methods when possible. Remove obsolete 'Apply directly' tests.

vincentmr changed the base branch from master to bugfix/cuda12 August 23, 2023 19:24

vincentmr added 2 commits August 23, 2023 12:28

Update changelog.

4305edc

Add test for arbitrary expval(Hermitian).

5595e3c

vincentmr added 3 commits August 23, 2023 14:40

Add getExpectationValueMultiQubitOpFunctor.

22c47f4

Add typename hint for macos.

1e1565d

Add typename macos.

614e4de

vincentmr marked this pull request as ready for review August 24, 2023 12:42

vincentmr requested a review from mlxd August 24, 2023 12:42

Use Kokkos::ThreadVectorRange policy for innerloop in getExpectationV…

b1afba8

…alueMultiQubitOpFunctor.

vincentmr requested a review from AmintorDusko August 24, 2023 13:21

AmintorDusko reviewed Aug 24, 2023

View reviewed changes

Merge branch 'bugfix/cuda12' into accel/expval

7b22095

Couple fix for HIP.

53b48d2

Merge branch 'bugfix/cuda12' into accel/expval

cb43f40

AmintorDusko approved these changes Aug 25, 2023

View reviewed changes

mlxd previously approved these changes Aug 25, 2023

View reviewed changes

Use inner product scheme instead of getExpectationValueMultiQubitOpFu…

d1384d1

…nctor to compute multi-qubit expval.

mlxd approved these changes Aug 25, 2023

View reviewed changes

vincentmr merged commit fb82dea into bugfix/cuda12 Aug 25, 2023
59 checks passed

vincentmr deleted the accel/expval branch August 25, 2023 14:48

vincentmr mentioned this pull request Aug 28, 2023

Template/expval #489

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accel/expval #481

Accel/expval #481

vincentmr commented Aug 23, 2023 •

edited

Loading

codecov bot commented Aug 23, 2023 •

edited

Loading

AmintorDusko left a comment

vincentmr commented Aug 24, 2023 •

edited

Loading

vincentmr commented Aug 24, 2023 •

edited

Loading

vincentmr commented Aug 24, 2023

AmintorDusko left a comment

mlxd left a comment

mlxd left a comment

Accel/expval #481

Accel/expval #481

Conversation

vincentmr commented Aug 23, 2023 • edited Loading

Before submitting

codecov bot commented Aug 23, 2023 • edited Loading

Codecov Report

AmintorDusko left a comment

Choose a reason for hiding this comment

vincentmr commented Aug 24, 2023 • edited Loading

vincentmr commented Aug 24, 2023 • edited Loading

vincentmr commented Aug 24, 2023

AmintorDusko left a comment

Choose a reason for hiding this comment

mlxd left a comment

Choose a reason for hiding this comment

mlxd left a comment

Choose a reason for hiding this comment

vincentmr commented Aug 23, 2023 •

edited

Loading

codecov bot commented Aug 23, 2023 •

edited

Loading

vincentmr commented Aug 24, 2023 •

edited

Loading

vincentmr commented Aug 24, 2023 •

edited

Loading