Skip to content

Commit

Permalink
Optimize LM controlled kernels [sc-73461] (#882)
Browse files Browse the repository at this point in the history
### Before submitting

Please complete the following checklist when submitting a PR:

- [x] All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to
the
      [`tests`](../tests) directory!

- [x] All new functions and code must be clearly commented and
documented.
If you do make documentation changes, make sure that the docs build and
      render correctly by running `make docs`.

- [x] Ensure that the test suite passes, by running `make test`.

- [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing
the
      change, and including a link back to the PR.

- [x] Ensure that code is properly formatted by running `make format`. 

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


------------------------------------------------------------------------------------------------------------

**Context:**
As a first step toward adding controls in Lightning Kokkos kernels, the
controlled Lightning Qubit kernels are simplified.

**Description of the Change:**
- Introduce `controlBitPatterns`, the controlled version of
`generateBitPatterns`, and remove obsolete `parity2indices`
implementations.
- Avoid temporary arrays/vectors as much as possible.
- Change `core_function` signature from coefficients & indices to
indices & offset.

**Benefits:**
- Combine and eliminate a few branches and loops.
- `indices` are now precomputed (thereby saving time) and only offset
needs be updated on the fly.
- All `omp parallel for` loops are now free of private arguments. 

We illustrate the performance improvement running the XAS workflow from
the benchmark suite. This workflow has a circuit with the following
specs
```
{'resources': Resources(num_wires=11, num_gates=17382, gate_types=defaultdict(<class 'int'>, {'StatePrep': 1, 'Hadamard': 1, 'PhaseShift': 4380, 'SingleExcitation': 4000, 'C(MultiRZ)': 9000}), gate_sizes=defaultdict(<class 'int'>, {10: 1, 1: 4381, 2: 4000, 3: 9000}), depth=9117, shots=Shots(total_shots=None, shot_vector=())), 'errors': {}, 'num_observables': 2, 'num_diagonalizing_gates': 4, 'num_trainable_params': 17381, 'num_device_wires': 11, 'num_tape_wires': 11, 'device_name': 'lightning.qubit', 'level': 'device', 'gradient_options': {}, 'interface': 'auto', 'diff_method': 'best', 'gradient_fn': 'adjoint'}
```
and a bottleneck is applying the 9000 `C(MultiRZ)` gates. With v0.38.0
we get (zooming on `simulate_and_jacobian` with SnakeViz)
![Screenshot from 2024-09-11
16-08-09](https://github.com/user-attachments/assets/2a2624ae-4276-4f55-8757-3fc58abb0260)
and for the current PR
![Screenshot from 2024-09-11
16-08-18](https://github.com/user-attachments/assets/08c8ce17-31a1-4419-aa1b-14dd27db4c80)
We get a 6.6x speed-up on the `C(MultiRZ)` gates.

**Possible Drawbacks:**

**Related GitHub Issues:**

---------

Co-authored-by: ringo-but-quantum <[email protected]>
Co-authored-by: Luis Alfredo Nuñez Meneses <[email protected]>
  • Loading branch information
3 people authored Sep 11, 2024
1 parent bbb3eb4 commit ef3a8cc
Show file tree
Hide file tree
Showing 5 changed files with 179 additions and 260 deletions.
3 changes: 3 additions & 0 deletions .github/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@
* Update GitHub actions in response to a high-severity vulnerability.
[(#887)](https://github.com/PennyLaneAI/pennylane-lightning/pull/887)

* Optimize and simplify controlled kernels in Lightning-Qubit.
[(#882)](https://github.com/PennyLaneAI/pennylane-lightning/pull/882)

* Optimize gate cache recording for `lightning.tensor` C++ layer.
[(#879)](https://github.com/PennyLaneAI/pennylane-lightning/pull/879)

Expand Down
2 changes: 1 addition & 1 deletion pennylane_lightning/core/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@
Version number (major.minor.patch[-label])
"""

__version__ = "0.39.0-dev19"
__version__ = "0.39.0-dev20"
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,37 @@ auto generateBitPatterns(const std::vector<std::size_t> &qubitIndices,
}
return indices;
}

/**
* @brief Introduce quantum controls in indices generated by
* generateBitPatterns.
*
* @param indices Indices for the operation.
* @param num_qubits Number of qubits in register.
* @param controlled_wires Control wires.
* @param controlled_values Control values (false or true).
*/
void controlBitPatterns(std::vector<std::size_t> &indices,
const std::size_t num_qubits,
const std::vector<std::size_t> &controlled_wires,
const std::vector<bool> &controlled_values) {
constexpr std::size_t one{1U};
if (controlled_wires.empty()) {
return;
}
std::vector<std::size_t> controlled_values_i(controlled_values.size());
std::transform(controlled_values.begin(), controlled_values.end(),
controlled_values_i.begin(),
[](const bool v) { return static_cast<std::size_t>(v); });
std::for_each(
indices.begin(), indices.end(),
[num_qubits, &controlled_wires, &controlled_values_i](std::size_t &i) {
for (std::size_t k = 0; k < controlled_wires.size(); k++) {
const std::size_t rev_wire =
(num_qubits - 1) - controlled_wires[k];
const std::size_t value = controlled_values_i[k];
i = (i & ~(one << rev_wire)) | (value << rev_wire);
}
});
}
} // namespace Pennylane::LightningQubit::Gates
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,19 @@ auto getIndicesAfterExclusion(const std::vector<std::size_t> &indicesToExclude,
auto generateBitPatterns(const std::vector<std::size_t> &qubitIndices,
std::size_t num_qubits) -> std::vector<std::size_t>;

/**
* @brief Introduce quantum controls in indices generated by
* generateBitPatterns.
*
* @param indices Indices for the operation.
* @param num_qubits Number of qubits in register.
* @param controlled_wires Control wires.
* @param controlled_values Control values (false or true).
*/
void controlBitPatterns(std::vector<std::size_t> &indices,
std::size_t num_qubits,
const std::vector<std::size_t> &controlled_wires,
const std::vector<bool> &controlled_values);
/**
* @brief Internal utility struct to track data indices of application for
* operations.
Expand Down
Loading

0 comments on commit ef3a8cc

Please sign in to comment.