Optimizing-LSTMs-on-GPU

Implementation of the paper "Optimizing Performance of Recurrent Neural Networks on GPUs" in CUDA and OpenMP.

The naming of the CPP files has been done in accordance with this NVIDIA blog. For example, LSTM_opti_4.cpp corresponds to Optimization 4: Pre-Transposing the Weight Matrix.

Example implementation:

With default parameters, the naive will take a long time to run. Use lower dimensions first to run it faster, for example you can use the following:

./naive-LSTM 5 1 64 8 1

This will give the following results:


Time for the run number NAIVE 0 :  49.17900000 ms 

Average Runtime for LSTM NAIVE is 49.17900000 ms 

Time for the run number NAIVE EFFICIENT 0 :  28.01100000 ms 

Average Runtime for LSTM NAIVE EFFICIENT is 28.01100000 ms

Therefore, we can see the second version is more efficient.

Comprehensive results for every optimization can be seen in our report.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
FINAL_IPSC_Project_Presentation_Shubodh_Manika.pdf		FINAL_IPSC_Project_Presentation_Shubodh_Manika.pdf
LSTM_naive.cpp		LSTM_naive.cpp
LSTM_opti_1.cpp		LSTM_opti_1.cpp
LSTM_opti_2.cpp		LSTM_opti_2.cpp
LSTM_opti_3.cpp		LSTM_opti_3.cpp
LSTM_opti_4_final.cpp		LSTM_opti_4_final.cpp
README.md		README.md
naive-LSTM		naive-LSTM
naive-LSTM.cpp		naive-LSTM.cpp
samply.cpp		samply.cpp
tags		tags

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing-LSTMs-on-GPU

Example implementation:

About

Releases

Packages

Languages

Shubodh/Optimizing-LSTMs-on-GPU

Folders and files

Latest commit

History

Repository files navigation

Optimizing-LSTMs-on-GPU

Example implementation:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages