Skip to content

This is a machine learning project of generating video caption by implementing encoder-decoder framework.

Notifications You must be signed in to change notification settings

CKRC24/Seq2seq-on-video-caption-generation

Repository files navigation

Video caption generation with Encoder-Decoder model

In this project, we developed basic Encoder-Decoder model, and S2VT model to generate video captions. In addition, we also applied Attention machnism to improve performance.

Getting Started

The following instructions will get you a copy of the project and running on your local machine for testing purposes.

Prerequisite & Toolkits

The following are some toolkits and their version you need to install for running this project

In addition, it is required to use GPU to run this project.

Model Structures

The following are the model structures we implemented in Pytorch from scratch:

  • [Baseline Model] image
  • [S2VT Model] image In order to improve performance, we also implemented Bahdanau Attention and Luong Attention image

Reference

[1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
[2] Minh-Thang Luong, Hieu Pham, Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation
[3] Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer. 2015. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
[4] Natsuda Laokulrat, Sang Phan, Noriki Nishida. 2016. Generating Video Description using Sequence-to-sequence Model with Temporal Attention

About

This is a machine learning project of generating video caption by implementing encoder-decoder framework.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published