Inference

Preparation

Install the environment with the following specified config:

conda env create -f videosalmonn.yml

Create directory to store checkpoints (If modify the structure/rename directories, need to change config files and model files accordingly)

mkdir -p ckpt/MultiResQFormer
mkdir -p ckpt/pretrained_ckpt

Then download the following model checkpoints:

Main video-SALMONN model checkpoint, then put it under ckpt/MultiResQFormer
InstructBLIP checkpoint for Vicuna-13B model, then put it under ckpt/pretrained_ckpt
EVA_VIT model checkpoint for InstructBLIP, then put it under ckpt/pretrained_ckpt
BEATs encoder checkpoint, then put it under ckpt/pretrained_ckpt

Run inference

python inference.py --cfg-path config/test.yaml

Check the result

The result is saved in the following path:

./ckpt/MultiResQFormer/<DateTime>/eval_result.json

Expecting the following result:

[
    {
        "id": "./dummy/4405327307.mp4_Describe the video and audio in detail",
        "conversation": [
            {
                "from": "human",
                "value": "Describe the video and audio in detail"
            },
            {
                "from": "gpt",
                "value": "None"
            }
        ],
        "task": "audiovisual_video_input",
        "ref_answer": "None",
        "gen_answer": "The video shows a group of musicians performing on stage, with a man singing into a microphone and playing the piano. There is also a drum set and a saxophone on stage. The audience is not visible in the video. The music is upbeat and energetic, and the performers seem to be enjoying themselves.</s>"
    }
]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Inference

Preparation

Run inference

Check the result

Files

README.md

Latest commit

History

README.md

File metadata and controls

Inference

Preparation

Run inference

Check the result