Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2023_qrack_raid_0 blog post #428

Closed
wants to merge 2 commits into from
Closed

Conversation

WrathfulSpatula
Copy link
Contributor

This is the blog post about Qrack's frontier experiments in recreating something equivalent to 2019 Sycamore (or universal random circuit sampling in general) requested by Will.

(Merry Christmas, and Happy Holidays! This is low priority, but I stole a few minutes for it during our family's celebrations. See you in 2024!)

Copy link

vercel bot commented Dec 26, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
unitary-fund-prel ✅ Ready (Inspect) Visit Preview Dec 26, 2023 3:27pm

Copy link
Member

@nathanshammah nathanshammah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WrathfulSpatula and sorry this has taken so long. I update the date: can you please also update the title of the file to 2024?

I made some initial comments. My feedback is a bit of summarizing/cutting text, if possible could help, together with some introductory subtitles along the way.

Comment on lines +4 to +6
day: 25
month: 12
year: 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
day: 25
month: 12
year: 2023
day: 5
month: 2
year: 2024


Believe it or not, for about $2,500, you can build your “dream” rig for gate-based quantum computer simulation up to 40 qubits and beyond (albeit significantly slower than a GPU cluster, but close to the processing throughput rate of a large CPU)! In this blog post, we’ll describe the construction of such a device and introduce you to how Unitary Fund’s Qrack simulator can use “novel” and approximate simulation techniques to push about 16 TB of high-speed swap disk from 40 qubits of state vector simulation into 54-qubit territory, potentially rivaling the 2019 Sycamore “quantum supremacy” experiments for fidelity and cost!

IBM was likely the first to point out, after the 2019 Sycamore universal random circuit sampling experiments, that disk storage could be employed to increase effective RAM limits of quantum computer simulation far past what is attainable with GPU and “DIMMs” (RAM boards inserted into your computer motherboard). Roughly, the largest supercomputers in the world, even arrayed together, might only have DRAM capacity for about 49 or 50 qubits of state vector simulation, aside from other techniques like tensor networks or any approximate simulation techniques. (Just as with your personal computer hardware, adding 1 qubit _doubles_ the necessary memory footprint, and the capacity of the largest supercomputers in the world, today, does not double in negligible time, effort, or cost!) However, disk storage can be _nearly_ as good as “DIMMs,” but drastically cheaper and far easier to scale to terabyte-range and past!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IBM was likely the first to point out, after the 2019 Sycamore universal random circuit sampling experiments, that disk storage could be employed to increase effective RAM limits of quantum computer simulation far past what is attainable with GPU and “DIMMs” (RAM boards inserted into your computer motherboard). Roughly, the largest supercomputers in the world, even arrayed together, might only have DRAM capacity for about 49 or 50 qubits of state vector simulation, aside from other techniques like tensor networks or any approximate simulation techniques. (Just as with your personal computer hardware, adding 1 qubit _doubles_ the necessary memory footprint, and the capacity of the largest supercomputers in the world, today, does not double in negligible time, effort, or cost!) However, disk storage can be _nearly_ as good as “DIMMs,” but drastically cheaper and far easier to scale to terabyte-range and past!
IBM was likely the first to [point out](https://research.ibm.com/blog/on-quantum-supremacy), after the 2019 Sycamore universal random circuit sampling experiments, that disk storage could be employed to increase effective RAM limits of quantum computer simulation far past what is attainable with GPU and “DIMMs” (RAM boards inserted into your computer motherboard). Roughly, the largest supercomputers in the world, even arrayed together, might only have DRAM capacity for about 49 or 50 qubits of state vector simulation, aside from other techniques like tensor networks or any approximate simulation techniques. (Just as with your personal computer hardware, adding 1 qubit _doubles_ the necessary memory footprint, and the capacity of the largest supercomputers in the world, today, does not double in negligible time, effort, or cost!) However, disk storage can be _nearly_ as good as “DIMMs,” but drastically cheaper and far easier to scale to terabyte-range and past!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you spell out once the exact words for the acronym DIMMs?


IBM was likely the first to point out, after the 2019 Sycamore universal random circuit sampling experiments, that disk storage could be employed to increase effective RAM limits of quantum computer simulation far past what is attainable with GPU and “DIMMs” (RAM boards inserted into your computer motherboard). Roughly, the largest supercomputers in the world, even arrayed together, might only have DRAM capacity for about 49 or 50 qubits of state vector simulation, aside from other techniques like tensor networks or any approximate simulation techniques. (Just as with your personal computer hardware, adding 1 qubit _doubles_ the necessary memory footprint, and the capacity of the largest supercomputers in the world, today, does not double in negligible time, effort, or cost!) However, disk storage can be _nearly_ as good as “DIMMs,” but drastically cheaper and far easier to scale to terabyte-range and past!

The obvious problem is that data transfer to and from disk is _far_ slower than “DIMMs.” Ultimately, “swap disk” as (random-access) memory needs to somehow be made fast enough to keep up with the processing throughput capacity of whatever CPU(s) to which it’s attached. However, Unitary Fund and the Qrack simulator development team have demonstrated proof-of-concept for creating swap disk resources that are _nearly_ fast enough to keep up with a 64-hyper-thread AMD Epyc processor: this prototype machine uses “RAID 0” configuration on 8 (“consumer-grade”) NVMe solid-state drives, achieving 8 times the throughput of a single drive! By “striping” the (single) logical swap disk partition across 8 hardware drives, interspersing small segments of memory in a pattern of “stripes” that are likely to be accessed at once across physical drives, the overall transfer speed nets to the _sum_ of the transfer speeds of each respective disk, achieving speeds close to sufficient to keep the CPU barely fully utilized on large state vector simulations (particularly with OS-level swap disk compression enabled, such as by manually enabling “zswap” with “zstd” compressor in Linux)!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add link? Suggestions: arXiv report and or previous blog post on Qrack white paper.

year: 2023
---

At 32-bit floating point precision (with 64-bit complex numbers), 32 *giga*bytes of memory will fit 32 qubits worth of state vector simulation. For each additional qubit desired, you need to _double_ the memory (compounded per additional qubit). 40 qubits, depending on precision, therefore requires 8 or 16 *tera*bytes (_256 or 512 times_ as much RAM). Unfortunately, typical ram DIMMs might come in 16 gigabyte increments (give-or-take up to a couple of factors of 2), and maybe you’ll fit 8 of them in a server motherboard, for 128 gigabytes. Alternatively, NVIDIA A100 GPUs might carry 80 gigabytes apiece, you might fit about 64 gigabytes worth of quantum state vector simulation in a single GPU, and then it will take 128 of these (with high-speed interconnects) to reach 8 terabytes, at a cost of thousands or tens of thousands of dollars per hour to rent. What is the hobbyist quantum computing researcher to do?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This post start is already very technical. Is there a way to make it a bit more gentle for non experts?

One suggestion would be to repeat the question of the title, with a bit more context. For example, something like: ```
There is a fast pace in the qubit count growth of experiments on quantum computers. At the same time, there is a growing literature of classical simulation (add links) of such experiments, often involving approximate methods and high performance computing (HPC) resources. One could thus be wondering:

What is the limit of what can be achieved on a personal computer/with moderate cloud resources?

In this blog post we provide a possible recipe to this answer, showing how to build a ~40 qubit simulator for less than $3000.


Still, even with this preamble, I feel like the current initial text starts a bit technical. Could you maybe provide a bit of context on how to do the resource estimation, without introducing immediately technical jargon? That could help a lot!


NVMes (based on NAND flash) are relatively cheap to manufacture and purchase by now. Your motherboard needs to support 4-way splitting of PCIe slots, if you use 4-way splitters, and your motherboard slots should probably have at least 16 PCIe lanes apiece. As we said, you might want a chassis with built-in mechanical fan cooling, commonly marketed as for those building gaming PCs, and you’ll likely also want a CPU cooler and some form of cooling specifically for the NVMe splitters, for which we used a vertical slot-mounted fan at a right angle to our two splitter boards. (Your chassis must provide such a vertical slot over the NVMe splitter PCIe slots, if you intend to follow our example.) No GPU is necessary! That’s about $2,000 for the key components of NVMes, motherboard, CPU, and DIMMs; budget up to $3,000 for the build and expect to walk away with about $500 left over, but $2,500 is about what we paid for our prototype, and a slightly better version of it could still be built for about $2,500, a year later.

When using the Qrack gate-based quantum computer simulator library, we officially suggest 64-bit “double” precision (with 128-bit complex numbers) when state vector simulation will exceed about 32 qubits: 16 TB of swap disk will realistically support a single state vector simulation up to 8 TB in width, giving you 39 qubits. However, we want to do something even more ambitious: we want our $2,500 machine to compete with 2019 Sycamore! In this case, we don’t need near-perfect fidelity of simulation, and we can approximate a circuit comparable to that Sycamore experiment with 32-bit “float” precision (with 64-bit complex numbers) and recourse to Qrack’s novel “hybrid” algorithms for simulation, including “quantum binary decision diagrams” (QBDD), based on the work of Robert Wille and team at Jülich Supercomputing Centre!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a link to Qrack.


When using the Qrack gate-based quantum computer simulator library, we officially suggest 64-bit “double” precision (with 128-bit complex numbers) when state vector simulation will exceed about 32 qubits: 16 TB of swap disk will realistically support a single state vector simulation up to 8 TB in width, giving you 39 qubits. However, we want to do something even more ambitious: we want our $2,500 machine to compete with 2019 Sycamore! In this case, we don’t need near-perfect fidelity of simulation, and we can approximate a circuit comparable to that Sycamore experiment with 32-bit “float” precision (with 64-bit complex numbers) and recourse to Qrack’s novel “hybrid” algorithms for simulation, including “quantum binary decision diagrams” (QBDD), based on the work of Robert Wille and team at Jülich Supercomputing Centre!

You can experiment with your own Qrack benchmarks, but, if you’re looking for a ready-made case of universal random circuit sampling similar to 2019 Sycamore, check out the case “test_noisy_fidelity_2qb_nn_estimate” in Qrack’s C++ benchmark suite: it runs Qrack’s “Schmidt-decomposition rounding parameter” (SDRP) approximation technique at successively lower levels of “rounding,” and therefore successively higher overall fidelity, until failure due to “out-of-memory.” While the gate set used is not exactly that of the 2019 Sycamore experiment, this set is designed to (likely) be more expressive in less circuit depth while being easier to simulate, on average. It intersperses single-qubit gate layers with maximally densely packed 2-qubit “nearest-neighbor” couplers across all qubits on a (close to) “square chip” similar to the 2019 Sycamore experiment topology. Each single-qubit gate entails a random Pauli basis transformation across all Pauli bases, composed with a uniformly random variational “RZ” gate (Z-axis rotation), such that it could take as few as 3 layers of single-qubit gates to achieve a completely general 3-parameter unitary (“U”) gate. Each 2-qubit gate is a coupler on a nearest neighbor topology, with a quarter of all gate options being the 3 controlled Pauli gates, a quarter being 3 “anti-controlled” Pauli gates (being gated by “0” control state instead of “1” control state), and half of all gates being variants of the “swap” gate with 6 physically distinct choices of phase effects in addition to the “swap” component, with a 50/50 “coin-flip” chance of choosing either direction between control and target qubits for all 2-qubit gates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add relevant link.

@KallieFerguson
Copy link
Contributor

KallieFerguson commented Feb 10, 2024

Happy to see there's movement here! If this gets published next week could you send a note to Frances so she can share on socials?

I see it also wishes people happy holidays at the end, we might want to change that now that it's February! @nathanshammah @WrathfulSpatula

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants