We have successfully pre-trained and fine-tuned our VideoMAE on Kinetics400, Something-Something-V2, UCF101 and HMDB51 with this codebase.
-
The pre-processing of Something-Something-V2 can be summarized into 3 steps:
-
Download the dataset from official website.
-
Preprocess the dataset by changing the video extension from
webm
to.mp4
with the original height of 240px. -
Generate annotations needed for dataloader ("<path_to_video> <video_class>" in annotations). The annotation usually includes
train.csv
,val.csv
andtest.csv
( heretest.csv
is the same asval.csv
). We share our annotation files (train.csv, val.csv, test.csv) via Google Drive. The format of*.csv
file is like:dataset_root/video_1.mp4 label_1 dataset_root/video_2.mp4 label_2 dataset_root/video_3.mp4 label_3 ... dataset_root/video_N.mp4 label_N
-
-
The pre-processing of Kinetics400 can be summarized into 3 steps:
-
Download the dataset from official website.
-
Preprocess the dataset by resizing the short edge of video to 320px. You can refer to MMAction2 Data Benchmark for TSN and SlowOnly.
Recommend: OpenDataLab provides a copy of Kinetics400 dataset, you can download Kinetics dataset with short edge 320px from here. -
Generate annotations needed for dataloader ("<path_to_video> <video_class>" in annotations). The annotation usually includes
train.csv
,val.csv
andtest.csv
( heretest.csv
is the same asval.csv
). The format of*.csv
file is like:dataset_root/video_1.mp4 label_1 dataset_root/video_2.mp4 label_2 dataset_root/video_3.mp4 label_3 ... dataset_root/video_N.mp4 label_N
-
- We use decord to decode the videos on the fly during both pre-training and fine-tuning phases.
- All experiments on Kinetics-400 in VideoMAE are based on this version.