This is an implementation for the paper "A Closer Look at Weak Label Learning for Audio Events". In this paper, we attempt to understand the challenges of large scale Audio Event Detection (AED) using weakly labeled data through a CNN based framework. Our network architecture is capable of handling variable length recordings and architecture design provides a way to control segment size of adjustable secondary outputs and thus these features eliminate the need for additional preprocessing steps. We look into how label density and label corruption affects performance and further compare mined web data as training data in comparison with manually labelled training data from AudioSet. We believe our work provides an approach to understand the challenges of weakly labeled learning and future AED works would benefit from our exploration.
We provide the Audioset data (list of files used in our experimentation) provided for reproducibility.
128 dimensional MelSpectrogram Features - Balanced Set - 10 second
128 dimensional MelSpectrogram Features - Validation Set - 10 second
128 dimensional MelSpectrogram Features - Testing Set - 10 second
128 dimensional MelSpectrogram Features - Balanced Set - 30 second
128 dimensional MelSpectrogram Features - Balanced Set - 60 second
All Embedding level features for AudioSet experiments
If you use our repository or feature representation for your research WALNet- weak label analysis, please cite our paper:
@article{shah2018closer,
title={A Closer Look at Weak Label Learning for Audio Events},
author={Shah, Ankit and Kumar, Anurag and Hauptmann, Alexander G and Raj, Bhiksha},
journal={arXiv preprint arXiv:1804.09288},
year={2018}
}
Training Set | MAP on Testing |
---|---|
AudioSet - 10 | 22.87 |
AudioSetAt30 | 22.42 |
AudioSetAt60 | 22.42 |
Model | MAP |
---|---|
ConvNet (mean pooling) | 20.3 |
ResNet (mean pooling) | 21.8 |
ResNet-ATT [Xu et al., 2017a] | 22.0 |
ResNet-SPDA [Zhang et al., 2016] | 21.9 |
Mmnet [Chou et al., 2018] | 22.6 |
WALNet [Shah.et.al, 2018] | 22.9 |
ESC-50 dataset | MAP |
---|---|
SoundNet | 74.2 |
WALNet | 83.5 |
Contact Ankit Shah ([email protected]) or Anurag Kumar ([email protected])