Skip to content

Long Short-Term Memory Neural Networks trained and tested on the TIMIT Acoustic-Phonetic Continuous Speech Corpus.

Notifications You must be signed in to change notification settings

gevangelopoulos/timit-lstm

Repository files navigation

Long Short-Term Memory Neural Networks for Automatic Speech Recognition on the TIMIT dataset

This repo contains Torch scripts and models I created when working on my master's thesis Efficient Hardware Mapping of LSTM Neural Networks for Speech Recognition (alternate link), in the ESAT-MICAS lab of KU Leuven, Belgium, from February till July 2016.

The code here is by no means perfect; It's a collection of patchy scripts aimed to do their job correctly and fast. The collections includes:

  • Scripts that import the TIMIT Acoustic Dataset from an HDF5 file format into Torch tensors
  • Scripts that set up a Bidirectional LSTM architecture using modules from torch-rnn
  • Scripts that use TIMIT's training set to train the LSTM architecture using backpropagation through time
  • Scripts that evaluate the performance of the trained models on TIMIT's validation and test set.

Some more functionality:

  • Logging scripts that report the loss function during training and the framewise classification error during testing
  • Scripts that export a timestamped snapshot of the model every epoch.

Pretrained models with snapshots over epochs are also included, since an LSTM parameter analysis over training could be useful. The included models were trained for a large number of epochs but only the first 40 epochs are included here, to save space. In any case, almost all models reached their peak performance of about 70% in accuracy (30% Framewise Classification Error) in less than 10-15 epochs, so if you want to use a model in your research, check the logs to see which snapshot you should grab.

Having a basic understanding of the Lua programming language and the Torch framework can be advantageous, and tweaking the code to use different LSTM sizes or architectures shouldn't be too difficult. Knowing a little more about machine learning could help you expand this model into a larger, customized application such as a deep LSTM, a convnet-LSTM combination, a GRU and more.

Dependencies

This project uses parts of many other projects. While the dependencies are not well defined, the following list should give you pretty much anything you'll need to reproduce these results.

  • The TIMIT Dataset - The TIMIT dataset used for this research, which was licensed from the Linguistic Data Consortium.
  • sph2pipe - A tool that can convert sphere files from the LDC corpus into wav files.
  • Torch - The Torch machine learning framework.
  • rnn - Efficient reusable RNNs and LSTMs for Torch
  • h5py - An HDF5 python library
  • torch-hdf5 - A Torch HDF5 library
  • HTK MFCC MATLAB - A matlab library for calculating MFCC coefficients in matlab.

More interesting, useful things

  • good-enough-lstm - How do LSTM networks perform when their parameters are butchered in various ways? This repository was also created during my master's thesis and uses Matlab scripts to perform arithmetic, mathematical and hardware manipulations on trained LSTMs.
  • char-rnn - Useful and fun character-level generative language models in Torch. Give it a text in a style, and it will learn to generate similar texts!
  • Andrej Karpathy's blog - A cool blog from a cool guy working on deeplearning.
  • Christopher Olah's blog - Another cool blog talking about deep learning, RNNs, LSTMs, data visualization and more.
  • LDC - How to use data from the Linguistic Data Consortium
  • matio - A .mat file i/o library.

Notes - Disclaimer - Citing

About

Long Short-Term Memory Neural Networks trained and tested on the TIMIT Acoustic-Phonetic Continuous Speech Corpus.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages