Koichi Namekata1 · Sherwin Bahmani1,2 · Ziyi Wu1,2 · Yash Kant1,2 · Igor Gilitschenski1,2 · David B. Lindell1,2
1University of Toronto · 2Vector Institute
Given a set of bounding boxes with associated trajectories, our framework enables object and camera motion control in image-to-video generation by leveraging the knowledge present in a pre-trained image-to-video diffusion model. Our method is self-guided, offering zero-shot trajectory control without fine-tuning or relying on external knowledge.
The code has been tested on:
- Ubuntu 22.04.5 LTS, Python 3.12.4, CUDA 12.4, NVIDIA RTX A6000 48GB
# clone the github repo
git clone https://github.com/Kmcode1/SG-I2V.git
cd SG-I2V
Create a conda environment and install PyTorch:
conda create -n sgi2v python=3.12.4
conda activate sgi2v
conda install pytorch=2.3.1 torchvision=0.18.1 pytorch-cuda=11.8 -c pytorch -c nvidia
Install packages:
pip install -r requirements.txt
You can run demo.ipynb
, which contains all the implementations (along with a light explanation) of our pipeline.
Alternatively, you can generate example videos demonstrated on the project website by running:
python inference.py --input_dir <input_path> --output_dir <output_path>
An example command that produces the same result as the notebook is CUDA_VISIBLE_DEVICES=0 python inference.py --input_dir ./examples/111 --output_dir ./output
. For convenience, we have provided a shell script, where it generates all the examples by running sh ./inference.sh
.
For the input format of examples, please refer to read_condition(input_dir, config)
in inference.py
for more details. Briefly, each example folder contains the first frame image (img.png
) and trajectory conditions (traj.npy
), where the trajectory conditions are encoded by the top-left/bottom-right coordinates of each bounding box + positions of its center coordinate across frames.
We are currently working on releasing evaluation codes.
Our implementation is partially inspired by DragAnything and FreeTraj. We thank the authors for their open-source contributions.
If you find our paper and code useful, please cite us:
@article{namekata2024sgi2v,
author = {Namekata, Koichi and Bahmani, Sherwin and Wu, Ziyi and Kant, Yash and Gilitschenski, Igor and Lindell, David B.},
title = {SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation},
journal = {arXiv preprint arXiv:2411.04989},
year = {2024},
}