Image-Caption-Generator

Image Caption Generator using CNN and LSTM

Dataset:

31.8k flickr dataset

Technologies Used:

Pytorch
Python
Spacy

Model Used:

1. CNN

Differnent Pre-trained cnn module(resnet-50, vgg, etc) were used for the feature extractor.

2. LSTM

It act as Decoder for decoding the feature vector generated by cnn module to corresponding context word.

Training:

Last layer of CNN module is removed, and fully connected layer is added that results in the feature vector of size (eg., 256). If batch_size=8, the ouput from cnn module will be of shape (8, 256)
For each target word it produces 256 length of embedding by passing through the embedding layer. Here the sentence of max_length=40 is used. so the output of embedding layer will be (8, 40, 256), considering batch_size=8.
The feature_vector from cnn_module and output of embedding_layer is concatenated to result in (8, 41, 256). This input shape is passed to the LSTM cell, which produces the 256 length of hidden_state and cell_state. After the, fc-layer is used to map the 256 length of feature vector to vocab_size=7500+(around). The ouput shape should be (8, 40, 7500+). considering, vocab_size = 7500+
This above process occurs for the 40th time step, cause lstm process the sequence word by word.
The training happens end-to-end.

Inference:

The Image passes through CNN module to generate feature vector of size 256.
This feature vector passes to lstm cell.
The lstm results on the probability distribution of words in vocab_size.
Loops for 40(max_length), until <end> token is found.
-- The embedding of ouput word is then passed as input to lstm cell.

Result:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
inference_images		inference_images
models		models
notebooks		notebooks
versions		versions
.gitignore		.gitignore
Image Caption generator - AI Fellowship.pptx		Image Caption generator - AI Fellowship.pptx
README.md		README.md
arial_narrow_7.ttf		arial_narrow_7.ttf
config.py		config.py
dataloader.py		dataloader.py
dataset.py		dataset.py
idx_str_map.pkl		idx_str_map.pkl
inference.py		inference.py
results.csv		results.csv
train.py		train.py
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-Caption-Generator

Dataset:

Technologies Used:

Model Used:

1. CNN

2. LSTM

Training:

Inference:

About

Releases

Packages

Languages

shulavkarki/Image-Caption-Generator

Folders and files

Latest commit

History

Repository files navigation

Image-Caption-Generator

Dataset:

Technologies Used:

Model Used:

1. CNN

2. LSTM

Training:

Inference:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages