SimOn: A Simple Framework for Online Temporal Action Localization

This is the official implementation of the paper SimOn: A Simple Framework for Online Temporal Action Localization. We provide training code for SimOn w/o context generation (P) as it is a part of our future research. The released code is for On-TAL and ODAS tasks on THUMOS14 dataset.

Introduction

Online Temporal Action Localization (On-TAL) aims to immediately provide action instances from untrimmed streaming videos. The model is not allowed to utilize future frames and any processing techniques to modify past predictions, making On-TAL much more challenging.
Inference result of our proposed method:

video_test_0000622_out_compress.mp4

In this paper, we propose a simple yet effective framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture in an end-to-end manner. Specifically, the model takes the current frame feature as a query and a set of past context information as keys and values of the Transformer.
Different from the prior work that uses a set of outputs of the model as past contexts, we leverage the past visual context and the learnable context embedding for the current query.
Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods, achieving a new state-of-the-art On-TAL performance. In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting.

Installation

# create ananconda env
conda create -n SimOn python=3.7 -y
conda activate SimOn
# install pytorch
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
# install others
pip install future tqdm ipdb mmcv opencv-python pandas 
python setup.py develop

Dataset

We provide THUMOS14 feature as reported in the paper. Download the dataset from this link. We adopt the extracted feature as well as some pieces of code from OadTR.
Unzip the file and put it under root, /data. The folder hierarchy should be organized like the following:

/data/
    ├── thumos14_anet # extracted feature from pre-trained ActivityNet
    │   ├── thumos_all_feature_test_tsn_v2.pickle
    │   ├── thumos_all_feature_val_tsn_v2.pickle
    │   └── thumos_val_anno.pickle
    ├── thumos14_feat # extracted feature from pre-trained TSN
    │   ├── ambiguous_dict.pkl
    │   ├── fps_info.pkl
    │   ├── thumos14.json
    │   ├── thumos_all_feature_test_V3.pickle
    │   ├── thumos_all_feature_val_V3.pickle
    │   ├── thumos_from_oad_multiclasses.pkl
    │   ├── thumos_test_anno_multiclass.pickle
    │   └── thumos_val_anno_multiclass.pickle

Usage

Train the model

To train the On-TAL model on THUMOS14, refer to the following instructions

The training time is very fast which is around 30 minutes on a single GeForce GTX 1080 Ti GPU.
To reproduce the results on THUMOS14 in On-TAL, please use the following command:

./thumos14_kinetics_run_scripts/train.sh

To train the model in ODAS task, please check out branch ODAS

Test the model

Test model in On-TAL task

To test the model on THUMOS in On-TAL task, use the following command:

./thumos14_kinetics_run_scripts/test.sh
./thumos14_kinetics_run_scripts/eval_tal_with_pred_oad.sh

The model should give 36.0 mAP the result may fluctuate a little.

Test model in ODAS task

Please check out branch ODAS to test the model

Qualitative results

We provide qualitative results according to the various challenging conditions, including overlapped multiple action instances, long-lasting actions, and promptly started actions. Our model produces confident predictions for overlapped multiple action instances as shown in Fig.4-(a), showing different actions in the same time step can be detected separately. In addition, our model can accomplish to predict relatively long-lasting actions, proving the effectiveness of the context modeling as shown in Fig. 4-(b). The initial learnable parameter enables the model to detect prompt actions at the beginning of the video as shown in Fig. 4-(c).

Citation

Please cite the paper in your publications if it helps your research:

@article{tang2022simon,
  title={SimOn: A Simple Framework for Online Temporal Action Localization},
  author={Tang, Tuan N and Park, Jungin and Kim, Kwonyoung and Sohn, Kwanghoon},
  journal={arXiv preprint arXiv:2211.04905},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
img		img
simon		simon
thumos14_kinetics_run_scripts		thumos14_kinetics_run_scripts
tools		tools
util		util
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dataset.py		dataset.py
main.py		main.py
setup.py		setup.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimOn: A Simple Framework for Online Temporal Action Localization

Table of Contents

Introduction

Installation

Dataset

Usage

Train the model

Test the model

Qualitative results

Citation

About

Releases

Packages

Languages

TuanTNG/SimOn

Folders and files

Latest commit

History

Repository files navigation

SimOn: A Simple Framework for Online Temporal Action Localization

Table of Contents

Introduction

Installation

Dataset

Usage

Train the model

Test the model

Qualitative results

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages