Multimodal Inductive Transfer Learning for Detection of Alzheimer’s Dementia and its Severity[Link will be updated on 12 August]
Utkarsh Sarawgi*, Wazeer Zulfikar*, Nouran Soliman, Pattie Maes
To appear in INTERSPEECH 2020
This work was also submitted to the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge
Alzheimer's disease is estimated to affect around 50 million people worldwide and is rising rapidly, with a global economic burden of nearly a trillion dollars. This calls for scalable, cost-effective, and robust methods for detection of Alzheimer's dementia (AD). We present a novel architecture that leverages acoustic, cognitive, and linguistic features to form a multimodal ensemble system. It uses specialized artificial neural networks with temporal characteristics to detect AD and its severity, which is reflected through Mini-Mental State Exam (MMSE) scores. We first evaluate it on the ADReSS challenge dataset, which is a subject-independent and balanced dataset matched for age and gender to mitigate biases, and is available through DementiaBank. Our system achieves state-of-the-art test accuracy, precision, recall, and F1-score of 83.3% each for AD classification, and state-of-the-art test root mean squared error (RMSE) of 4.60 for MMSE score regression. To the best of our knowledge, the system further achieves state-of-the-art AD classification accuracy of 88.0% when evaluated on the full benchmark DementiaBank Pitt database. Our work highlights the applicability and transferability of spontaneous speech to produce a robust inductive transfer learning model, and demonstrates generalizability through a task-agnostic feature-space.
-
Simple: Fast and elegant models which when ensembled produces competitive results in multiple tasks
-
Multimodal: Uses Disfluency, Acoustic, and Inter-vention features with voting for a robust model
-
Strong: Our ensemble model achieves 83.3% classification accuracy and 4.60 rmse for MMSE score regression in ADReSS
-
Robust: Balanced dataset for gender and age by ADReSS ensures rigid testing
Model | Accuracy | Precision | Recall | F1-Score | RMSE (MMSE*) |
---|---|---|---|---|---|
Luz et al. | 0.75 | 0.83 | 0.62 | 0.71 | 5.21 |
Sarawgi et al. (ours) | 0.83 | 0.83 | 0.83 | 0.83 | 4.60 |
Model | Accuracy | Precision | Recall | F1-Score | RMSE (MMSE*) |
---|---|---|---|---|---|
Luz et al. | 0.77 | 0.77 | 0.76 | 0.77 | 4.38 |
Sarawgi et al. (ours) | 0.99 | 0.99 | 1.0 | 0.99 | 0.82 |
* Mini Mental State Exam scores
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Fraser et al. | 0.82 | - | - | - |
Masrani | 0.85 | - | - | 0.85 |
Kong et al. | 0.87 | 0.86 | 0.91 | 0.88 |
Sarawgi et al. (ours) | 0.88 | 0.92 | 0.82 | 0.88 |
Above results for DementiaBank are using 10 fold cross validation
Request access from DementiaBank
- Install dependencies using
pip install -r requirements.txt
- Install and setup OpenSmile for Compare features extraction following COMPARE.md
- Extract compare features
Set config parameters in config.py
and run python main.py
We use an Ensemble model of (1) Disfluency, (2) Acoustic, and (3) Inter-ventions models for AD classification. Then (4) Regression module is added at the top of the Ensemble for MMSE regression.
This code is released under the MIT License (refer to the LICENSE for details).
If you find this project useful for your research, please use the following BibTeX entry.
@inproceedings{xxxxx,
title={xxxx},
author={Sarawgi, Zulfikar, Soliman, Maes},
booktitle={xxxxxxxx},
year={2020}
}