Skip to content

Option pricing using cutting-edge computational methods. This repository leverages high-performance computing techniques, MPI, CUDA and OpenMP for fast and accurate pricing valuations.

Notifications You must be signed in to change notification settings

tonystark20-hy/CSCI596_Option-Pricing

Repository files navigation

CSCI596_Option-Pricing

How to run:

Single CPU, Single GPU running on multiple Cuda threads and a single CPU thread:

foo@bar:~$ make main
foo@bar:~$ ./main option_type -B barrier_price -K strike_price -N number_of_paths

Multiple CPU, Multiple GPU running on multiple Cuda threads and multiple CPU threads:

foo@bar:~$ make main_omp
foo@bar:~$ ./main_omp option_type -B barrier_price -K strike_price -N number_of_paths -threads thread_count

Multiple Node, Multiple CPUs, Multiple GPUs running on multiple Cuda threads and multiple CPU threads

foo@bar:~$ make main_mpi
foo@bar:~$ mpirun -bind-to none -n number_nodes ./main_mpi option_type -B barrier_price -K strike_price -N number_of_paths -threads thread_count

Brief background

This project simulates barrier options, where the pay-off not only depends on the underlying asset's price at maturity but also on whether the underlying hits a price known as the barrier.

We have implemented the following types of barrier options which can be passed as a command line option:

  • "daoc" - Down and Out Call Options
  • "uaop" - Up and Out Put Options
  • "uaic" - Up and In Call Options
  • "daip" - Down and In Put Options

We have set the rebate price as 0, but an option to allow a different rebate price can be added in the fututre.

Presentation

Link

Methodology

Architecture of the system we run this on:

We run our program on the CARC High-Performance computing cluster. The architecture looks something like this:

Alt text

The computation is divided between nodes, and each node runs a process. Each node interacts with each other using the Message Passing Interface (MPI)

Each node has multiple CPU Cores in it, and these cores can run multiple threads for each node process. These threads use OpenMP for interaction and parallelization of threads

Each node also has accelerated GPU units associated with it where each unit can run multiple CUDA threads.

How the code works:

We use the Geometric Brownian Motion to simulate the underlying price, and the discretized Euler method version comes down to this:

$$ S_t = S_{t-1}\ +\ \mu S_{t-1} \Delta t+ \ \sigma S_{n-1}\Delta W_t $$

$S_t$: The price of the underlying at time t
$\mu$ : The expected return
$\sigma$ : The expected volatility
$\Delta t$: The time difference between each iteration
$\Delta W_t$: Random number drawn from a distribution with mean 0 and variance $\Delta t$ (Brownian motion component)

The code generates an array of random elements $\Delta W_t$ and simulates the price motion according to it. We have four versions of the code:

  1. Simple single, threaded version

  2. Single CPU, Single GPU running on multiple Cuda threads and a single CPU thread

  3. Multiple CPU, Multiple GPU running on multiple Cuda threads and multiple CPU threads
    Each CPU runs a single CPU thread. Here, we allocate a GPU to every CPU and reduce the result from multiple CUDA threads running on a Single GPU performed on a Single CPU thread.

  4. Multiple Node, Multiple CPUs, Multiple GPUs running on multiple Cuda threads and multiple CPU threads
    Each node runs a version of Multiple CPUs, Multiple GPUs running on multiple Cuda threads, and multiple CPU threads

Charts:

We perform all the tests on Down-and-in-Put-Options but we have implemented other versions of options too

Weak Scaling :

Weak scaling on the number of MPI nodes, if threads per node = 1

Alt text

Weak scaling on the number of MPI nodes, if threads per node = 2

Alt text

Weak scaling on the number of MPI nodes, if threads per node = 4

Alt text

Weak Scaling on the number of threads (Keeping nodes = 1)

Alt text

Strong Scaling:

Strong scaling on the number of MPI nodes, if threads per node = 1

Alt text

Strong scaling on the number of MPI nodes, if threads per node = 2

Alt text

Strong scaling on the number of MPI nodes, if threads per node = 4

Alt text

Strong scaling on the number of threads, if node = 1

Alt text

Strong scaling on the number of threads, if node = 2

Alt text

Strong scaling on the number of threads, if node = 4

Alt text

Speed of Single Threaded, Single Node on CUDA:

Scaling of Cuda speed with respect to size of the input

Alt text

For Larger Inputs

We also tested the program on larger inputs. For inputs larger than these we were running into memory constraints. Although we have an idea about how to get around them, it wasn't possible to do it before the submission deadline, as we would have to perform tests on those versions as well. Below are some runtimes on larger inputs that we didn't plot, just to keep the size of this Readme brief.

Alt text

Conclusion

As we can see from the above charts, the scaling depends on the configuration. We believe that if we get around the memory constraints for larger input sizes, we would see much better scaling in larger sizes of inputs. Right now, it is faster to run if the input size fits exactly the GPU memory. Adding nodes and threads adds communcation overhead. But once the input size is much larger than CUDA's memory capacity, Parallel Nodes and threads improve the scaling of the program.

References

  1. "Monte Carlo Simulations In CUDA - Barrier Option Pricing", QuantStart, Link

About

Option pricing using cutting-edge computational methods. This repository leverages high-performance computing techniques, MPI, CUDA and OpenMP for fast and accurate pricing valuations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published