GitHub - gustavo-fardo/speech-emotion-ptbr: Classification of emotions based on speech prosody (intonation, rythm, stress) in Portuguese

Speech Emotion in Portuguese - BR

This repository contains the develompent of classification models for recognizing emotions in speech based on prosody (intonation, rythm, stress, etc.). The data used is the emoUERJ open dataset. The Jupyter Notebooks detail several methods of feature extraction, tested on a Support Vector Machine classifier and a custom Deep Learning model built with Keras.

Requirements

Python 3.9
Dataset emoUERJ

Installation

Repository installation

git clone https://github.com/gustavo-fardo/speech-emotion-ptbr
cd ./speech-emotion-ptbr

Create a python virtual environment (recommended):

sudo apt install python3.9
sudo apt install virtualenv
python3.9 -m virtualenv .venv --python=$(which python3.9)

OBS: every time you open a new terminal, activate the virtual environment with the command:

source .venv/bin/activate

Deactivate it with:

deactivate

To reproduce results, dowloading emoUERJ is needed, and then to put it inside /datasets folder

Data Augmentation

The data augmentation methods, using the Audiomentations library, were used to triple the size of the dataset, and are the following:

Gaussian Noise with random amplification between 0.001 and 0.01
Time Stretch between 0.8 and 1.24 times
Pitch Shift with random semitone variation of -2 to 2

The data augmentation process is detailed in feat_extract.ipynb and feat_extract2.ipynb.

Feature Extraction

The feature extraction methods tested on emoUERJ are documented in the feat_extract.ipynb and feat_extract2.ipynb, with the following:

TSFEL library (too slow, not tested with the models)
Praat e Parselmouth, based on https://github.com/uzaymacar/simple-speech-features
pyAudioAnalysis library
Mel-Frequency Cepstrum Coefficients (MFCC) with Librosa library
Mel Spectrogram with Librosa library

Models

SVM: a simple support vector machine with linear kernel and C=1.0
Neural Network: a deep learning network built with Keras

Results

Implementations

The requirements for the implementations can be installed with:

pip install -r requirements.txt

embedded_classifier.py: captures audio from a microphone and classifies it with a emotion in near real-time
realtime_emotion_subtitle.py: given a audio in .wav, gives a proportion of each of the 4 emotions in near real-time

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
datasets/fardo		datasets/fardo
features		features
results		results
utils		utils
.gitignore		.gitignore
ProjectReport(portuguese).pdf		ProjectReport(portuguese).pdf
README.md		README.md
embedded_classifier.py		embedded_classifier.py
feat_extract.ipynb		feat_extract.ipynb
feat_extract2.ipynb		feat_extract2.ipynb
model.png		model.png
neural_infer.py		neural_infer.py
neural_train_eval.ipynb		neural_train_eval.ipynb
realtime_emotion_subtitle.py		realtime_emotion_subtitle.py
requirements.txt		requirements.txt
svm_train_eval.py		svm_train_eval.py
svm_train_infer.py		svm_train_infer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Emotion in Portuguese - BR

Requirements

Installation

Data Augmentation

Feature Extraction

Models

Results

Implementations

Authors

About

Releases

Packages

Languages

gustavo-fardo/speech-emotion-ptbr

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion in Portuguese - BR

Requirements

Installation

Data Augmentation

Feature Extraction

Models

Results

Implementations

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages