This repository contains the PyTorch implementation of our PromptCC model in the paper: "A Decoupling Paradigm with Prompt Learning for Remote Sensing Image Change Captioning".
For more information, please see our published paper in [IEEE] (Accepted by TGRS 2023)
- Considering the specificity of the RSICC task, PromptCC employs a novel decoupling paradigm and deeply integrates prompt learning and pre-trained large language models.
- This repository will encompass all aspects of our code, including training, inference, computation of evaluation metrics, as well as the tokenization and word mapping used in our work.
git clone https://github.com/Chen-Yang-Liu/PromptCC.git
cd PromptCC
conda create -n PromptCC_env python=3.9
conda activate PromptCC_env
pip install -r requirements.txt
Firstly, download the image pairs of LEVIR_CC dataset from the [Repository]. Extract images pairs and put them in ./data/LEVIR_CC/
as follows:
./data/LEVIR_CC:
├─LevirCCcaptions_v1.json (one new json file with changeflag, different from the old version from the above Download link)
├─images
├─train
│ ├─A
│ ├─B
├─val
│ ├─A
│ ├─B
├─test
│ ├─A
│ ├─B
Then preprocess dataset as follows:
python create_input_files.py
After that, you can find some resulted .pkl
files in ./data/LEVIR_CC/
.
Of course, you can use our provided resulted .pkl
files directly in [Hugging face].
Please modify the source code of 'CLIP' package, please modify CLIP.model.VisionTransformer.forward() like [this].
You can download our pretrained model here: [Hugging face]
After downloaded the model, put cls_model.pth.tar
in ./checkpoints/classification_model/
and put BEST_checkpoint_ViT-B_32.pth.tar
in ./checkpoints/cap_model/
.
Then, run a demo to get started as follows:
python caption_beams.py
Make sure you performed the data preparation above. Then, start training as follows:
python train.py
python eval2.py
We recommend training 5 times to get an average score.
Note:
- It's important to note that, before model training and evaluation, a sentence needs to undergo tokenization and mapping of words to indices. For instance, in the case of the word “difference”, GPT would tokenize it as ['diff', 'erence'] using its subword-based tokenization mechanism and map them to [26069, 1945] using its word mapping. Different tokenization and word mapping will influence the scores of the evaluation metrics. Therefore, to ensure a fair performance comparison, it is essential to utilize the same tokenization and word mapping when calculating evaluation metrics for all comparison methods.
- For all comparison methods, we have retrained and evaluated model performance using the publicly available tokenizer and word mapping of GPT, which are more comprehensive and widely acknowledged. We also recommend that future researchers follow this.
- Comparison with SOTA:
If you find this paper useful in your research, please consider citing:
@ARTICLE{10271701,
author={Liu, Chenyang and Zhao, Rui and Chen, Jianqi and Qi, Zipeng and Zou, Zhengxia and Shi, Zhenwei},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={A Decoupling Paradigm With Prompt Learning for Remote Sensing Image Change Captioning},
year={2023},
volume={61},
number={},
pages={1-18},
doi={10.1109/TGRS.2023.3321752}}