Skip to content

Commit

Permalink
Added arxiv link and bibtex of captioning to website (#559)
Browse files Browse the repository at this point in the history
* Added arxiv link and bibtex of captioning to website.
  • Loading branch information
lorisbaz authored and tdomhan committed Oct 16, 2018
1 parent 4e4951d commit 7fd7f15
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 7 deletions.
22 changes: 15 additions & 7 deletions docs/image_captioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,19 @@ layout: default
---
# Image Captioning

This module extends Sockeye to perform image captioning. It follows the same logic of sequence-to-sequence frameworks, which consist of encoder-decoder models.
Sockeye provides also a module to perform image captioning.
It follows the same logic of sequence-to-sequence frameworks, which consist of encoder-decoder models.
In this case the encoder takes an image instead of a sentence and encodes it in a feature representation.
This is decoded with attention (optionally) using exactly the same models of Sockeye (RNNs, transformers, or CNNs).
This tutorial explains how to train image captioning models.


## Citation

For technical information about the image captioning module, see our paper on the arXiv ([BibTeX](sockeye_captioning.bib)):

> Loris Bazzani, Tobias Domhan, and Felix Hieber. 2018.
> [Image Captioning as Neural Machine Translation Task in SOCKEYE](https://arxiv.org/abs/1810.04101). ArXiv e-prints.

## Installation
Expand All @@ -22,9 +32,7 @@ Optionally you can also install matplotlib for visualization:
```


## First Steps

### Train
## Train

In order to train your first image captioning model you will need two sets of parallel files: one for training
and one for validation. The latter will be used for computing various metrics during training.
Expand Down Expand Up @@ -91,7 +99,7 @@ There is an initial overhead to load the feature (training does not start immedi

You can add the options `--decode-and-evaluate 200 --max-output-length 60` to perform captioning of the part of the validation set (200 samples in this case) during training.

### Image to Text
## Image to Text

Assuming that features were pre-extracted, you can do image captioning as follows:

Expand Down Expand Up @@ -126,7 +134,7 @@ You can also caption directly from image with the option `--extract-image-featur
```


#### Using Lexical Constrains
### Using Lexical Constrains

It is also possible to use lexical constraints during inference as described [here](inference.html#lexical-constraints).
The input JSON object needs to have the following form, with the image path in the `text` field, and constraints specified as usual:
Expand All @@ -139,7 +147,7 @@ The input JSON object needs to have the following form, with the image path in t
You can use the `sockeye.lexical_constraints` module to generate this (for usage, run `python3 -m sockeye.lexical_constraints`).
Once the file is generated, the CLI option `--json-input` needs to be passed to `sockeye.image_captioning.captioner`.

### Visualization
## Visualization

You can now visualize the results in a nice format as follows:

Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ It implements state-of-the-art encoder-decoder architectures, such as
- Fully convolutional sequence-to-sequence models [[Gehring et al, '17](https://arxiv.org/abs/1705.03122)]

In addition, this framework provides an experimental [image-to-description module](https://github.com/awslabs/sockeye/tree/master/sockeye/image_captioning) that can be used for [image captioning](image_captioning.html).

Recent developments and changes are tracked in our [CHANGELOG](https://github.com/awslabs/sockeye/blob/master/CHANGELOG.md).

If you are interested in collaborating or have any questions, please submit a pull request or [issue](https://github.com/awslabs/sockeye/issues/new).
Expand Down
12 changes: 12 additions & 0 deletions docs/sockeye_captioning.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
@article{SockeyeCaptioning:18,
author = {Bazzani, Loris and Domhan, Tobias and Hieber, Felix},
title = "{Image Captioning as Neural Machine Translation Task in SOCKEYE}",
journal = {arXiv preprint arXiv:1810.04101},
archivePrefix = "arXiv",
eprint = {1810.04101},
primaryClass = "cs.CV",
keywords = {Computer Science - Computer Vision and Pattern Recognition},
year = 2018,
month = oct,
url = {https://arxiv.org/abs/1810.04101}
}

0 comments on commit 7fd7f15

Please sign in to comment.