Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
### Before submitting Please complete the following checklist when submitting a PR: - [x] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the [`tests`](../tests) directory! - [x] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running `make docs`. - [x] Ensure that the test suite passes, by running `make test`. - [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing the change, and including a link back to the PR. - [x] Ensure that code is properly formatted by running `make format`. When all the above are checked, delete everything above the dashed line and fill in the pull request template. ------------------------------------------------------------------------------------------------------------ **Context:** As a first step toward adding controls in Lightning Kokkos kernels, the controlled Lightning Qubit kernels are simplified. **Description of the Change:** - Introduce `controlBitPatterns`, the controlled version of `generateBitPatterns`, and remove obsolete `parity2indices` implementations. - Avoid temporary arrays/vectors as much as possible. - Change `core_function` signature from coefficients & indices to indices & offset. **Benefits:** - Combine and eliminate a few branches and loops. - `indices` are now precomputed (thereby saving time) and only offset needs be updated on the fly. - All `omp parallel for` loops are now free of private arguments. We illustrate the performance improvement running the XAS workflow from the benchmark suite. This workflow has a circuit with the following specs ``` {'resources': Resources(num_wires=11, num_gates=17382, gate_types=defaultdict(<class 'int'>, {'StatePrep': 1, 'Hadamard': 1, 'PhaseShift': 4380, 'SingleExcitation': 4000, 'C(MultiRZ)': 9000}), gate_sizes=defaultdict(<class 'int'>, {10: 1, 1: 4381, 2: 4000, 3: 9000}), depth=9117, shots=Shots(total_shots=None, shot_vector=())), 'errors': {}, 'num_observables': 2, 'num_diagonalizing_gates': 4, 'num_trainable_params': 17381, 'num_device_wires': 11, 'num_tape_wires': 11, 'device_name': 'lightning.qubit', 'level': 'device', 'gradient_options': {}, 'interface': 'auto', 'diff_method': 'best', 'gradient_fn': 'adjoint'} ``` and a bottleneck is applying the 9000 `C(MultiRZ)` gates. With v0.38.0 we get (zooming on `simulate_and_jacobian` with SnakeViz) ![Screenshot from 2024-09-11 16-08-09](https://github.com/user-attachments/assets/2a2624ae-4276-4f55-8757-3fc58abb0260) and for the current PR ![Screenshot from 2024-09-11 16-08-18](https://github.com/user-attachments/assets/08c8ce17-31a1-4419-aa1b-14dd27db4c80) We get a 6.6x speed-up on the `C(MultiRZ)` gates. **Possible Drawbacks:** **Related GitHub Issues:** --------- Co-authored-by: ringo-but-quantum <[email protected]> Co-authored-by: Luis Alfredo Nuñez Meneses <[email protected]>
- Loading branch information