-
Notifications
You must be signed in to change notification settings - Fork 79
GPU implementation for multiexp and fft #35
base: master
Are you sure you want to change the base?
Conversation
returning some very useful functions
gpu accerate
with some test
ifft_using_bitreversed_ntt using bit_rev_best_ct_ntt_2_best_fft_gpu
bitreversed_lde_using_bitreversed_ntt by using bit_rev_best_ct_ntt_2_best_fft_gpu
Hey @chenhuan14 Before I even start to review it, can you tell me how you have got 6 hours of proving time for merging of two proofs? If you mean recursive aggregation it should be around 4m gates to aggregate two proofs, that is provable in minutes on a 6 physical core laptop. I hope you didn't run the prover in debug build? This can result in 6 hours easily |
Thanks for your reply. I'm a beginner of Rust language, and run the prover in debug model. In the release model, the GPU acceleration only gain 70% compared to the CPU implementation. |
I actually did the same mistake myself in the beginning. What you can expect is well below 30m proving time for any plonk circuit over BN254 (Ethereum curve) if you use 16 physical core machine |
I'm not the GPU specialist, but started to review the FFT part and will have some comments. Multiexp is even more challenging, so most likely I'll do it someway during the holidays |
Ok, here are my comments so far:
|
Kind of a separate concern: you use a lockfiles to ensure an exclusive access to the device. Not sure about the implementation, so what would happen if:
|
Thanks for your good advices. I will try to optimize this work in the near future. |
I have implemented GPU accelerate for multiexp and FFT motived by the filecoin implementation, which can greatly improve the efficiency of prover. I use this implementation to accelerate the recursive snarks of PLONK, the native implementation of recursive_aggregation_circuit need nearly 6hours to generate a merge proof for 2 proofs, while our GPU implementation, it spent only 10 minutes.