Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Welcome! In this repo, you can find the code for our paper published in WACV 2022.

Main idea of this metric is to use classic image captioning metrics CIDEr or SPICE to better evaluate retrieval models.

Using our metric as an adaptive margin can be found in https://github.com/andrespmd/semantic_adaptive_margin. Roughly, we want to take into account the NON-GROUND TRUTH items effect in top-k retrieved items to better evaluate what our models do.

Now, this repo is divided into main section. The former is for the curious and while the latter is for the pragmatists!

To those who are curious! (How did we do it?)

First off, we had to change the code of SPICE to save all the pairwise distances.

If you would like to compile from scratch or would like to see the changes we made to SPICE, please check the submodule!

Here is the link to download the compiled version: SPICE.zip. After downloading, unzip the file and run python get_stanford models and then run

java -Xmx8G -jar spice-1.0.jar ./example.json

to see if it works. This should result in a file called spice_pairwise.csv.

Now, to obtain the pairwise distances of captions with CIDEr, we run:

python custom_cider.py --dataset [coco/f30k]

To obtain these distances we used MSCOCO and Flickr30k, here are they for you to download. The reason we run these commands is to preprocess all the pairwise distances. So that we reduce the time it takes to run the NCS metric.

To those who are impatient! (I just wanna use the metric and nothing more!)

You are a pragmatist and just wanna use the code (I feel you!). Download the precomputed pairwise distances here.

As a format, we expect a similarity matrix saved as json where each row are images and each column are sentences. For example, for Flickr30k, the matrix would have dimensions of 1000x5000; 1000 images, 5000 sentences. Distance metric choice doesn't matter, you can use anything. As an example of the format, we provide some of models' similarity matrix.

Finally, just run to get the results:

python eval.py --dataset [coco/f30k] --metric_name [spice/cider] --model_path [ThePathToSimilarityMatrix]

There are more options to be selected, you can read them inside the code.

Conclusion

To err is human.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
SPICE @ a8f69f1		SPICE @ a8f69f1
.gitmodules		.gitmodules
README.md		README.md
custom_cider.py		custom_cider.py
eval.py		eval.py
eval.sh		eval.sh
metric.py		metric.py
model.jpg		model.jpg
preprocess.py		preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

To those who are curious! (How did we do it?)

To those who are impatient! (I just wanna use the metric and nothing more!)

Conclusion

About

Releases

Packages

Contributors 2

Languages

furkanbiten/ncs_metric

Folders and files

Latest commit

History

Repository files navigation

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

To those who are curious! (How did we do it?)

To those who are impatient! (I just wanna use the metric and nothing more!)

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages