Skip to content

๐Ÿง  An Unsupervised Reinforcement Learning Pipeline for Video Frame Classification

License

Notifications You must be signed in to change notification settings

Project-Agni/Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

23 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ”ฅ Detection ๐Ÿ”ฅ

๐Ÿง  An Unsupervised Reinforcement Learning Pipeline for Video Frame Classification

๐Ÿšง This is a Proof of Concept Project ๐Ÿšง
๐Ÿšง Authors are not Responsible for Damages to Life and Property if Deployed ๐Ÿšง
STDIM
Fig. 1 - Spatio Temporal Deep InfoMax aka STDIM. It helps us maximize the temporal MI spatially across local feature maps. This is attributed to the existence of the local-local as well the global-local infomax estimations. We also incorporate a spatial prior to incentivize the encoder to focus on all forms of variation.

GLCT

Motivation ๐Ÿš€

The algorithm we use is inspired the works of Anand et. Al. from 2020 at MILA Labs and Microsoft Research. We repurpose this for video frame classification.

  • Inspired by human learning which is largely unsupervised, a state representation learning algorithm learns the high-level features from the image frame neither with labels with explicit rewards nor by modelling the pixels directly.
  • As we work with frames of a video, our data is temporally consistent. Additionally, local consistency is also observed as some objects donโ€™t move drastically over time. We exploit these structures to learn the representations directly.

Further Explanation ๐Ÿง

Fig. 2 - (right) shows the contrastive task of learning the final discriminator. We use a bilinear model for calculation of the score function based on the output from the representation encoder below. The objective function of the discriminator assigns large values to positive examples and small values to negative examples by maximizing the given bound in the top equation.

This translates into maximizing the true positives while minimizing the mis predictions and false alarms.

Usage ๐Ÿ‘จโ€๐Ÿ’ป

Get the dataset from here and place it under datasets.

python runner.py --arch [cnn, dqn, usrl]

The trained weights will be stored in the root of the runner script.

Inference

python test.py

Todo ๐Ÿ“œ

  • CNN
  • RL - DQN
  • RL - USRL
  • Live cam test script

References ๐Ÿ“‘

@article{anand2019unsupervised,
  title={Unsupervised State Representation Learning in Atari},
  author={Anand, Ankesh and Racah, Evan and Ozair, Sherjil and Bengio, Yoshua and Cot'e, Marc-Alexandre and Hjelm, R Devon},
  journal={arXiv preprint arXiv:1906.08226},
  year={2019}
}

About

๐Ÿง  An Unsupervised Reinforcement Learning Pipeline for Video Frame Classification

Topics

Resources

License

Stars

Watchers

Forks

Languages