CSCI596_Option-Pricing

How to run:

Single CPU, Single GPU running on multiple Cuda threads and a single CPU thread:

foo@bar:~$ make main
foo@bar:~$ ./main option_type -B barrier_price -K strike_price -N number_of_paths

Multiple CPU, Multiple GPU running on multiple Cuda threads and multiple CPU threads:

foo@bar:~$ make main_omp
foo@bar:~$ ./main_omp option_type -B barrier_price -K strike_price -N number_of_paths -threads thread_count

Multiple Node, Multiple CPUs, Multiple GPUs running on multiple Cuda threads and multiple CPU threads

foo@bar:~$ make main_mpi
foo@bar:~$ mpirun -bind-to none -n number_nodes ./main_mpi option_type -B barrier_price -K strike_price -N number_of_paths -threads thread_count

Brief background

This project simulates barrier options, where the pay-off not only depends on the underlying asset's price at maturity but also on whether the underlying hits a price known as the barrier.

We have implemented the following types of barrier options which can be passed as a command line option:

"daoc" - Down and Out Call Options
"uaop" - Up and Out Put Options
"uaic" - Up and In Call Options
"daip" - Down and In Put Options

We have set the rebate price as 0, but an option to allow a different rebate price can be added in the fututre.

Presentation

Link

Methodology

Architecture of the system we run this on:

We run our program on the CARC High-Performance computing cluster. The architecture looks something like this:

The computation is divided between nodes, and each node runs a process. Each node interacts with each other using the Message Passing Interface (MPI)

Each node has multiple CPU Cores in it, and these cores can run multiple threads for each node process. These threads use OpenMP for interaction and parallelization of threads

Each node also has accelerated GPU units associated with it where each unit can run multiple CUDA threads.

How the code works:

We use the Geometric Brownian Motion to simulate the underlying price, and the discretized Euler method version comes down to this:

$$ S_t = S_{t-1}\ +\ \mu S_{t-1} \Delta t+ \ \sigma S_{n-1}\Delta W_t $$

$S_t$: The price of the underlying at time t
$\mu$ : The expected return
$\sigma$ : The expected volatility
$\Delta t$: The time difference between each iteration
$\Delta W_t$: Random number drawn from a distribution with mean 0 and variance $\Delta t$ (Brownian motion component)

The code generates an array of random elements $\Delta W_t$ and simulates the price motion according to it. We have four versions of the code:

Simple single, threaded version
Single CPU, Single GPU running on multiple Cuda threads and a single CPU thread
Multiple CPU, Multiple GPU running on multiple Cuda threads and multiple CPU threads
Each CPU runs a single CPU thread. Here, we allocate a GPU to every CPU and reduce the result from multiple CUDA threads running on a Single GPU performed on a Single CPU thread.
Multiple Node, Multiple CPUs, Multiple GPUs running on multiple Cuda threads and multiple CPU threads
Each node runs a version of Multiple CPUs, Multiple GPUs running on multiple Cuda threads, and multiple CPU threads

Charts:

We perform all the tests on Down-and-in-Put-Options but we have implemented other versions of options too

Weak Scaling :

Weak scaling on the number of MPI nodes, if threads per node = 1

Weak scaling on the number of MPI nodes, if threads per node = 2

Weak scaling on the number of MPI nodes, if threads per node = 4

Weak Scaling on the number of threads (Keeping nodes = 1)

Strong Scaling:

Strong scaling on the number of MPI nodes, if threads per node = 1

Strong scaling on the number of MPI nodes, if threads per node = 2

Strong scaling on the number of MPI nodes, if threads per node = 4

Strong scaling on the number of threads, if node = 1

Strong scaling on the number of threads, if node = 2

Strong scaling on the number of threads, if node = 4

Speed of Single Threaded, Single Node on CUDA:

Scaling of Cuda speed with respect to size of the input

For Larger Inputs

We also tested the program on larger inputs. For inputs larger than these we were running into memory constraints. Although we have an idea about how to get around them, it wasn't possible to do it before the submission deadline, as we would have to perform tests on those versions as well. Below are some runtimes on larger inputs that we didn't plot, just to keep the size of this Readme brief.

Conclusion

As we can see from the above charts, the scaling depends on the configuration. We believe that if we get around the memory constraints for larger input sizes, we would see much better scaling in larger sizes of inputs. Right now, it is faster to run if the input size fits exactly the GPU memory. Adding nodes and threads adds communcation overhead. But once the input size is much larger than CUDA's memory capacity, Parallel Nodes and threads improve the scaling of the program.

References

"Monte Carlo Simulations In CUDA - Barrier Option Pricing", QuantStart, Link

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.vscode		.vscode
charts		charts
README.md		README.md
dev_array.h		dev_array.h
image.png		image.png
kernel.cu		kernel.cu
kernel.h		kernel.h
main.cpp		main.cpp
main.sl		main.sl
main_cpu.cpp		main_cpu.cpp
main_mpi.cpp		main_mpi.cpp
main_mpi.sl		main_mpi.sl
main_mpi_without_cuda.cpp		main_mpi_without_cuda.cpp
main_omp.cpp		main_omp.cpp
mainomp.sl		mainomp.sl
makefile		makefile
running_commands.txt		running_commands.txt
scale_cuda.out		scale_cuda.out
scale_cuda.sl		scale_cuda.sl
scale_mpi.out		scale_mpi.out
scale_mpi.sl		scale_mpi.sl
scale_n1.sl		scale_n1.sl
scale_omp.out		scale_omp.out
scale_omp.sl		scale_omp.sl
scale_t1.sl		scale_t1.sl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSCI596_Option-Pricing

How to run:

Single CPU, Single GPU running on multiple Cuda threads and a single CPU thread:

Multiple CPU, Multiple GPU running on multiple Cuda threads and multiple CPU threads:

Multiple Node, Multiple CPUs, Multiple GPUs running on multiple Cuda threads and multiple CPU threads

Brief background

Presentation

Methodology

Architecture of the system we run this on:

How the code works:

Charts:

Weak Scaling :

Strong Scaling:

Speed of Single Threaded, Single Node on CUDA:

For Larger Inputs

Conclusion

References

About

Releases

Packages

Contributors 2

Languages

tonystark20-hy/CSCI596_Option-Pricing

Folders and files

Latest commit

History

Repository files navigation

CSCI596_Option-Pricing

How to run:

Single CPU, Single GPU running on multiple Cuda threads and a single CPU thread:

Multiple CPU, Multiple GPU running on multiple Cuda threads and multiple CPU threads:

Multiple Node, Multiple CPUs, Multiple GPUs running on multiple Cuda threads and multiple CPU threads

Brief background

Presentation

Methodology

Architecture of the system we run this on:

How the code works:

Charts:

Weak Scaling :

Strong Scaling:

Speed of Single Threaded, Single Node on CUDA:

For Larger Inputs

Conclusion

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages