Skip to content

Classification of emotions based on speech prosody (intonation, rythm, stress) in Portuguese

Notifications You must be signed in to change notification settings

gustavo-fardo/speech-emotion-ptbr

Repository files navigation

Speech Emotion in Portuguese - BR

This repository contains the develompent of classification models for recognizing emotions in speech based on prosody (intonation, rythm, stress, etc.). The data used is the emoUERJ open dataset. The Jupyter Notebooks detail several methods of feature extraction, tested on a Support Vector Machine classifier and a custom Deep Learning model built with Keras.

Requirements

Installation

Repository installation

git clone https://github.com/gustavo-fardo/speech-emotion-ptbr
cd ./speech-emotion-ptbr

Create a python virtual environment (recommended):

sudo apt install python3.9
sudo apt install virtualenv
python3.9 -m virtualenv .venv --python=$(which python3.9)

OBS: every time you open a new terminal, activate the virtual environment with the command:

source .venv/bin/activate

Deactivate it with:

deactivate

To reproduce results, dowloading emoUERJ is needed, and then to put it inside /datasets folder

Data Augmentation

The data augmentation methods, using the Audiomentations library, were used to triple the size of the dataset, and are the following:

  • Gaussian Noise with random amplification between 0.001 and 0.01
  • Time Stretch between 0.8 and 1.24 times
  • Pitch Shift with random semitone variation of -2 to 2

The data augmentation process is detailed in feat_extract.ipynb and feat_extract2.ipynb.

Feature Extraction

The feature extraction methods tested on emoUERJ are documented in the feat_extract.ipynb and feat_extract2.ipynb, with the following:

Models

  • SVM: a simple support vector machine with linear kernel and C=1.0
  • Neural Network: a deep learning network built with Keras

Results

Implementations

The requirements for the implementations can be installed with:

pip install -r requirements.txt
  • embedded_classifier.py: captures audio from a microphone and classifies it with a emotion in near real-time
  • realtime_emotion_subtitle.py: given a audio in .wav, gives a proportion of each of the 4 emotions in near real-time

Authors

About

Classification of emotions based on speech prosody (intonation, rythm, stress) in Portuguese

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published