Skip to content

Latest commit

 

History

History
1943 lines (1687 loc) · 289 KB

README_2023.md

File metadata and controls

1943 lines (1687 loc) · 289 KB

INTERSPEECH-2023-24-Papers

General Information Awesome Conference Version License: MIT
Repository Size and Activity GitHub repo size GitHub commit activity (branch)
Contribution Statistics GitHub contributors GitHub closed issues GitHub issues GitHub closed pull requests GitHub pull requests
Other Metrics GitHub last commit GitHub watchers GitHub forks GitHub Repo stars Visitors
Application App
Progress Status
Main

INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. ⭐ the repository to support the advancement of speech technology!

INTERSPEECH 2023

Main
Total Papers Preprint Papers Papers with Open Code

👉 * This count includes repositories on GitHub, GitLab, Hugging Face, and distributions on PyPI, while excluding Web Page or GitHub Page links.


Tip

The PDF version of the INTERSPEECH 2023 Conference Programme, comprises a list of all accepted full papers, their presentation order, as well as the designated presentation times.


Other collections of the best AI conferences

Important

Conference table will be up to date all the time.

Conference Year
2023 2024
Computer Vision (CV)
CVPR
ICCV  
ECCV
WACV  
FG
Speech/Signal Processing (SP/SigProc)
ICASSP
INTERSPEECH
ISMIR  
Natural Language Processing (NLP)
EMNLP
Machine Learning (ML)
AAAI
ICLR
ICML
NeurIPS

Contributors



Note

Contributions to improve the completeness of this list are greatly appreciated. If you come across any overlooked papers, please feel free to create pull requests, open issues or contact me via email. Your participation is crucial to making this repository even better.


App Conference
Section Papers
Resources for Spoken Language Processing Papers Preprints Open Code
Speech Synthesis: Prosody and Emotion Papers Preprints Open Code
Statistical Machine Translation Papers Preprints Open Code
Self-Supervised Learning in ASR Papers Preprints Open Code
Prosody Papers Preprints Open Code
Speech Production Papers Preprints Open Code
Dysarthric Speech Assessment Papers Preprints Open Code
Speech Coding: Transmission Papers Preprints Open Code
Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation Papers Preprints Open Code
Analysis of Speech and Audio Signals Papers Preprints Open Code
Speech Recognition: Architecture, Search, and Linguistic Components Papers Preprints Open Code
Speech Recognition: Technologies and Systems for New Applications Papers Preprints Open Code
Lexical and Language Modeling for ASR Papers Preprints Open Code
Language Identification and Diarization Papers Preprints Open Code
Speech Quality Assessment Papers Preprints Open Code
Feature Modeling for ASR Papers Preprints Open Code
Interfacing Speech Technology and Phonetics Papers Preprints Open Code
Speech Synthesis: Multilinguality Papers Preprints Open Code
Speech Emotion Recognition Papers Preprints Open Code
Spoken Dialog Systems and Conversational Analysis Papers Preprints Open Code
Speech Coding and Enhancement Papers Preprints Open Code
Paralinguistics Papers Preprints Open Code
Speech Enhancement and Denoising Papers Preprints Open Code
Speech Synthesis: Evaluation Papers Preprints Open Code
End-to-End Spoken Dialog Systems Papers Preprints Open Code
Biosignal-enabled Spoken Communication Papers Preprints Open Code
Neural-based Speech and Acoustic Analysis Papers Preprints Open Code
List of sections

DiGo - Dialog for Good: Speech and Language Technology for Social Good

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2194 A Multimodal Investigation of Speech, Text, Cognitive and Facial Video Features for Characterizing Depression with and without Medication ISCA
Pdf
307 Understanding Disrupted Sentences using Underspecified Abstract Meaning Representation GitHub ISCA
Amazon Science
2109 Developing Speech Processing Pipelines for Police Accountability ISCA
arXiv
2086 Prosody-Controllable Gender-Ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception GitHub ISCA
848 Affective Attributes of French Caregivers' Professional Speech ISCA

Spoken Language Processing: Translation, Information Retrieval, Summarization, Resources, and Evaluation

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
180 Pragmatic Pertinence: A Learnable Confidence Metric to Assess the Subjective Quality of LM-Generated Text ISCA
2078 ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition ISCA
arXiv
916 BASS: Block-wise Adaptation for Speech Summarization GitHub ISCA
1258 Speaker Tracking using Graph Attention Networks with Varying Duration Utterances in Multi-Channel Naturalistic Data: Fearless Steps Apollo 11 Audio Corpus ISCA
36 Combining Language Corpora in a Japanese Electromagnetic Articulography Database for Acoustic-to-Articulatory Inversion ISCA
523 A Dual Attention-based Modality-Collaborative Fusion Network for Emotion Recognition GitHub ISCA
2174 Large Dataset Generation of Synchronized Music Audio and Lyrics at Scale using Teacher-Student Paradigm ISCA
483 Enc-Dec RNN Acoustic Word Embeddings Learned via Pairwise Prediction GitHub ISCA
864 Query based Acoustic Summarization for Podcasts ISCA
1242 Spot Keywords from Very Noisy and Mixed Speech ISCA
arXiv
891 Knowledge Distillation on Joint Task End-to-End Speech Translation ISCA
Amazon Science
343 Investigating Pre-trained Audio Encoders in the Low-Resource Condition GitHub ISCA
arXiv
1718 Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target ISCA
arXiv
823 MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information GitHub ISCA
arXiv
1674 CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition WEB Page ISCA
arXiv
1762 Improving Zero-Shot Cross-Domain Slot Filling via Transformer-based Slot Semantics Fusion ISCA
619 Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer ISCA
1468 Boosting Punctuation Restoration with Data Generation and Reinforcement Learning ISCA
695 J-ToneNet: A Transformer-based Encoding Network for Improving Tone Classification in Continuous Speech via F0 Sequences ISCA
1152 Towards Cross-Language Prosody Transfer for Dialog WEB Page
GitHub
ISCA
Pdf
2506 Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models ISCA
arXiv
1980 ITALIC: An Italian Intent Classification Dataset GitHub
Zenodo
ISCA
arXiv
1778 Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation ISCA
1466 How ChatGPT is Robust for Spoken Language Understanding? ISCA
1233 GigaST: A 10,000-hour Pseudo Speech Translation Corpus GitHub Page ISCA
arXiv
1570 Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism ISCA
2473 Crowdsource-based Validation of the Audio Cocktail as a Sound Browsing Tool ISCA
1675 PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts GitHub ISCA
1358 Speech-to-Face Conversion using Denoising Diffusion Probabilistic Models ISCA
2255 Inter-Connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation ISCA
arXiv
1068 How Does Pretraining Improve Discourse-aware Translation? ISCA
arXiv
1135 PATCorrect: Non-Autoregressive Phoneme-Augmented Transformer for ASR Error Correction ISCA
arXiv
161 Model-assisted Lexical Tone Evaluation of Three-Year-Old Chinese-Speaking Children by also Considering Segment Production ISCA
1392 Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding GitHub ISCA
arXiv
1582 Joint Time and Frequency Transformer for Chinese Opera Classification ISCA
116 AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination ISCA
arXiv
2252 Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective ISCA
arXiv
2250 Combining Heterogeneous Structures for Event Causality Identification ISCA
1208 An Efficient Approach for the Automated Segmentation and Transcription of the People's Speech Corpus ISCA
1425 Diverse Feature Mapping and Fusion via Multitask Learning for Multilingual Speech Emotion Recognition ISCA
903 Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text GitHub ISCA
arXiv
466 Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin GitHub ISCA
arXiv
1878 Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition ISCA
597 PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords GitHub ISCA
69 Mix before Align: Towards Zero-Shot Cross-Lingual Sentiment Analysis via Soft-Mix and Multi-View Learning ISCA
170 AlignAtt: using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation ISCA
arXiv
2225 Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff ISCA
1979 Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages GitHub ISCA
arXiv

Speech, Voice, and Hearing Disorders

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2421 Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test ISCA
arXiv
2198 Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition ISCA
1759 Towards Supporting an Early Diagnosis of Multiple Sclerosis using Vocal Features ISCA
1891 Whisper Features for Dysarthric Severity-Level Classification ISCA
2191 A New Benchmark of Aphasia Speech Recognition and Detection based on E-Branchformer and Multi-task Learning GitHub GitHub ISCA
arXiv
222 Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra ISCA
2026 A Stutter Seldom Comes Alone - Cross-Corpus Stuttering Detection as a Multi-label Problem ISCA
arXiv
1542 Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis ISCA
2203 DuTa-VC: A Duration-aware Typical-to-Atypical Voice Conversion Approach with Diffusion Probabilistic Model GitHub Page
GitHub
ISCA
arXiv
201 CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice GitHub ISCA
University of Southampton
1541 Arabic Dysarthric Speech Recognition using Adversarial and Signal-based Augmentation GitHub ISCA
arXiv
1887 Weakly-Supervised Forced Alignment of Disfluent Speech using Phoneme-level Modeling GitHub ISCA
arXiv
1998 Glottal Source Analysis of Voice Deficits in Basal Ganglia Dysfunction: Evidence from de novo Parkinson's Disease and Huntington's Disease ISCA
2478 An Analysis of Glottal Features of Chronic Kidney Disease Speech and its Application to CKD Detection ISCA
983 Weakly Supervised Glottis Segmentation in High-Speed Video Endoscopy using Bounding Box Labels ISCA
1669 Investigating the Dynamics of Hand and Lips in French Cued Speech using Attention Mechanisms and CTC-based Decoding ISCA
arXiv
670 Hearing Loss Affects Emotion Perception in Older Adults: Evidence from a Prosody-Semantics Stroop Task ISCA
554 Cochlear-Implant Listeners Listening to Cochlear-Implant Simulated Speech ISCA
2168 Validation of a Task-Independent Cepstral Peak Prominence Measure with Voice Activity Detection ISCA
1679 Score-balanced Loss for Multi-aspect Pronunciation Assessment GitHub ISCA
arXiv
2108 Federated Learning for Secure Development of AI Models for Parkinson's Disease Detection using Speech from Different Languages ISCA
arXiv
652 F0inTFS: A Lightweight Periodicity Enhancement Strategy for Cochlear Implants ISCA
1678 Differentiating Acoustic and Physiological Features in Speech for Hypoxia Detection ISCA
HAL Science
786 Mandarin Electrolaryngeal Speech Voice Conversion using Cross-Domain Features ISCA
arXiv
866 Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion ISCA
arXiv
1744 Which Aspects of Motor Speech Disorder are Captured by Mel Frequency Cepstral Coefficients? Evidence from the Change in STN-DBS Conditions in Parkinson's Disease ISCA
1096 Detecting Manifest Huntington's Disease using Vocal Data ISCA
1623 Exploring Multi-Task Learning and Data Augmentation in Dementia Detection with Self-Supervised Pre-trained Models ISCA

Spoken Term Detection and Voice Search

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
478 Matching Latent Encoding for Audio-Text based Keyword Spotting ISCA
arXiv
1215 Self-Paced Pattern Augmentation for Spoken Term Detection in Zero-Resource ISCA
2362 On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation ISCA
Amazon Science
90 Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics ISCA
arXiv
689 Improving Small Footprint Few-Shot Keyword Spotting with Supervision on Auxiliary Data ISCA
2222 Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability ISCA

Models for Streaming ASR

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
831 Enhancing the Unified Streaming and Non-Streaming Model with Contrastive Learning ISCA
arXiv
1497 ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs ISCA
arXiv
361 Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation ISCA
arXiv
1129 DCTX-Conformer: Dynamic Context Carry-over for Low Latency Unified Streaming and Non-Streaming Conformer ISCA
arXiv
1121 Knowledge Distillation from Non-Streaming to Streaming ASR Encoder using Auxiliary Non-Streaming Layer ISCA
884 Adaptive Contextual Biasing for Transducer based Streaming Speech Recognition ISCA
arXiv

Source Separation

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1753 Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model GitHub Page
GitHub
ISCA
arXiv
1389 Remixing-based Unsupervised Source Separation from Scratch ISCA
577 CAPTDURE: Captioned Sound Dataset of Single Sources ISCA
arXiv
488 Recursive Sound Source Separation with Deep Learning-based Beamforming for Unknown Number of Sources ISCA
2537 Multi-Channel Speech Separation with Cross-Attention and Beamforming ISCA
185 Background-Sound Controllable Voice Source Separation ISCA

Speech Perception

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1922 A Neural Architecture for Selective Attention to Speech Features ISCA
1122 Quantifying Informational Masking due to Masker Intelligibility in Same-Talker Speech-in-Speech Perception ISCA
1476 On the Benefits of Self-Supervised Learned Speech Representations for Predicting Human Phonetic Misperceptions ISCA
2154 Predicting Perceptual Centers Located at Vowel Onset in German Speech using Long Short-Term Memory Networks ISCA
63 Exploring the Mutual Intelligibility Breakdown Caused by Sculpting Speech from a Competing Speech Signal ISCA
2103 Perception of Incomplete Voicing Neutralization of Obstruents in Tohoku Japanese ISCA

Phonetics and Phonology: Languages and Varieties

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1879 The Emergence of Obstruent-Intrinsic f0 and VOT as Cues to the Fortis/Lenis Contrast in West Central Bavarian ISCA
431 〈'〉 in Tsimane': A Preliminary Investigation GIN ISCA
2200 Segmental Features of Brazilian (Santa Catarina) Hunsrik ISCA
2337 Opening or Closing? An Electroglottographic Analysis of Voiceless Coda Consonants in Australian English ISCA
295 Increasing Aspiration of Word-Medial Fortis Plosives in Swiss Standard German ISCA
1456 Lexical Stress and Velar Palatalization in Italian: A Spatio-Temporal Interaction ISCA

Speaker and Language Identification

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1989 Vietnam-Celeb: A Large-Scale Dataset for Vietnamese Speaker Recognition GitHub ISCA
2254 What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model GitHub Page ISCA
arXiv
241 The 2022 NIST Language Recognition Evaluation ISCA
arXiv
155 Description and Analysis of the KPT system for NIST Language Recognition Evaluation 2022 ISCA
1725 ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention GitHub ISCA
arXiv
402 Branch-ECAPA-TDNN: A Parallel Branch Architecture to Capture Local and Global Features for Speaker Verification ISCA
2052 Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech ISCA
arXiv
2569 Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification ISCA
1407 A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-trained General Purpose Speech Model GitHub ISCA
arXiv
2272 HABLA: A Dataset of Latin American Spanish Accents for Voice Anti-Spoofing Zenodo ISCA
1702 Self-Supervised Learning Representation based Accent Recognition with Persistent Accent Memory ISCA
800 Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory ISCA
1974 Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance ISCA
arXiv
105 Pyannote.Audio 2.1 Speaker Diarization Pipeline: Principle, Benchmark and Recipe GitHub ISCA
Pdf
1524 Model Compression for DNN-based Speaker Verification using Weight Quantization ISCA
arXiv
1354 Multi-Resolution Approach to Identification of Spoken Languages and to Improve Overall Language Diarization System using Whisper Model ISCA
125 Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms ISCA
arXiv
849 Dynamic Fully-Connected Layer for Large-Scale Speaker Verification ISCA
844 Reversible Neural Networks for Memory-Efficient Speaker Verification ISCA
777 ECAPA++: Fine-grained Deep Embedding Learning for TDNN based Speaker Verification ISCA
1206 TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection ISCA
arXiv
100 Fooling Speaker Identification Systems with Adversarial Background Music ISCA
1314 Mutual Information-based Embedding Decoupling for Generalizable Speaker Verification ISCA
574 Target Active Speaker Detection with Audio-Visual Cues GitHub ISCA
arXiv
2401 Improving End-to-End Neural Diarization using Conversational Summary Representations ISCA
arXiv
2039 Phase Perturbation Improves Channel Robustness for Speech Spoofing Countermeasures GitHub Page
GitHub
ISCA
arXiv
210 Improving Training Datasets for Resource-constrained Speaker Recognition Neural Networks ISCA
1498 Instance-based Temporal Normalization for Speaker Verification ISCA
881 On the Robustness of Wav2Vec 2.0 based Speaker Recognition Systems ISCA
697 P-Vectors: A Parallel-coupled TDNN/Transformer Network for Speaker Verification GitHub ISCA
arXiv
1249 Group GMM-ResNet for Detection of Synthetic Speech Attacks ISCA
452 Robust Training for Speaker Verification against Noisy Labels GitHub ISCA
arXiv
1404 Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization ISCA
1217 Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022 WEB Page ISCA
arXiv
1648 Describing the Phonetics in the Underlying Speech Attributes for Deep and Interpretable Speaker Recognition GitHub ISCA
1214 Range-based Equal Error Rate for Spoof Localization ISCA
arXiv
1888 Exploring the English Accent-Independent Features for Speech Emotion Recognition using Filter and Wrapper-based Methods for Feature Selection ISCA
205 Powerset Multi-Class Cross Entropy Loss for Neural Speaker Diarization GitHub ISCA
394 A Method of Audio-Visual Person Verification by Mining Connections between Time Series ISCA
605 One-Step Knowledge Distillation and Fine-Tuning in using Large Pre-trained Self-Supervised Learning Models for Speaker Verification GitHub ISCA
arXiv
409 Defense Against Adversarial Attacks on Audio DeepFake Detection GitHub ISCA
arXiv
1820 A Conformer-based Classifier for Variable-Length Utterance Processing in Anti-Spoofing GitHub ISCA
1557 Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification ISCA
2419 CommonAccent: Exploring Large Acoustic Pre-trained Models for Accent Classification based on Common Voice ISCA
ResearchGate
266 From Adaptive Score Normalization to Adaptive Data Normalization for Speaker Verification Systems ISCA
1513 CAM++: A Fast and Efficient Network for Speaker Verification using Context-aware Masking GitHub ISCA
arXiv
1928 North Sámi Dialect Identification with Self-Supervised Speech Models GitHub ISCA
arXiv
2289 Encoder-Decoder Multimodal Speaker Change Detection ISCA
arXiv
1603 Disentangled Representation Learning for Multilingual Speaker Recognition WEB Page ISCA
arXiv
2310 A Compact End-to-End Model with Local and Global Context for Spoken Language Identification GitHub ISCA
arXiv
1005 On the Robustness of Arabic Speech Dialect Identification ISCA
arXiv
927 Adaptive Neural Network Quantization for Lightweight Speaker Verification ISCA
1205 Adversarial Diffusion Probability Model For Cross-Domain Speaker Verification Integrating Contrastive Loss ISCA
1554 Chinese Dialect Recognition based on Transfer Learning ISCA
270 Spoofing Attacker also Benefits from Self-Supervised Pretrained Model ISCA
arXiv
854 Label aware Speech Representation Learning for Language Identification ISCA
arXiv
1761 Exploring the Impact of Back-end Network on Wav2vec 2.0 for Dialect Identification ISCA
453 Improving Speaker Verification with Self-pretrained Transformer Models GitHub ISCA
arXiv
372 Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-based, Alignment-Free and Hybrid Approaches ISCA
arXiv

Speech Synthesis and Voice Conversion

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2336 Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction ISCA
160 Streaming Parrotron for On-Device Speech-to-Speech Conversion ISCA
arXiv
2407 Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech GitHub Page ISCA
2518 E2E-S2S-VC: End-to-End Sequence-to-Sequence Voice Conversion GitHub Page ISCA
2403 DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer GitHub ISCA
arXiv
419 Voice Conversion with Just Nearest Neighbors GitHub Page
GitHub
ISCA
arXiv
1193 CFVC: Conditional Filtering for Controllable Voice Conversion GitHub Page ISCA
1157 DualVC: Dual-mode Voice Conversion using Intra-Model Knowledge Distillation and Hybrid Predictive Coding GitHub Page ISCA
arXiv
39 Attention-based Interactive Disentangling Network for Instance-Level Emotional Voice Conversion GitHub Page ISCA
836 ALO-VC: Any-to-Any Low-Latency One-Shot Voice Conversion GitHub Page ISCA
arXiv
1978 Evaluating and Reducing the Distance between Synthetic and Real Speech Distributions ISCA
arXiv
2202 Decoupling Segmental and Prosodic cues of Non-Native Speech through Vector Quantization GitHub Page ISCA
2383 VC-T: Streaming Voice Conversion based on Neural Transducer GitHub Page ISCA
191 Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion Preserving Voice Conversion GitHub ISCA
1788 ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed WEB Page GitHub ISCA
arXiv
1356 Reverberation-Controllable Voice Conversion using Reverberation Time Estimator ISCA
2558 Cross-Utterance Conditioned Coherent Speech Editing WEB Page ISCA

Speech and Language in Health: from Remote Monitoring to Medical Conversations

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2287 An Automatic Multimodal Approach to Analyze Linguistic and Acoustic Cues on Parkinson's Disease Patients ISCA
1332 Personalization for Robust Voice Pathology Detection in Sound Waves GitHub ISCA
2249 Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders ISCA
1990 Capturing Mismatch between Textual and Acoustic Emotion Expressions for Mood Identification in Bipolar Disorder ISCA
Pdf
296 FTA-Net: A Frequency and Time Attention Network for Speech Depression Detection ISCA
1709 Bayesian Networks for the Robust and Unbiased Prediction of Depression and its Symptoms Utilizing Speech and Multimodal Data ISCA
Pdf
1263 Hyper-Parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition ISCA
arXiv
1721 Classifying Depression Symptom Severity: Assessment of Speech Representations in Personalized and Generalized Machine Learning Models ISCA
1946 Active Learning for Abnormal Lung Sound Data Curation and Detection in Asthma ISCA
2079 Automatic Assessment of Alzheimer's across Three Languages using Speech and Language Features ISCA
301 On-the-Fly Feature based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition GitHub ISCA
arXiv
1722 Relationship between LTAS-based Spectral Moments and Acoustic Parameters of Hypokinetic Dysarthria in Parkinson's Disease ISCA
963 Respiratory Distress Estimation in Human-Robot Interaction Scenario ISCA
1771 Prediction of the Gender-based Violence Victim Condition using Speech: What do Machine Learning Models rely on? ISCA
1916 Whisper Encoder features for Infant Cry Classification ISCA
1997 Classifying Dementia in the Presence of Depression: A Cross-Corpus Study ISCA
297 Exploiting Cross-Domain and Cross-Lingual Ultrasound Tongue Imaging Features for Elderly and Dysarthric Speech Recognition ISCA
arXiv
464 Multi-Class Detection of Pathological Speech with Latent Features: How does It Perform on Unseen Data? ISCA
arXiv
2002 Responsiveness, Sensitivity and Clinical Utility of Timing-Related Speech Biomarkers for Remote Monitoring of ALS Disease Progression ISCA
Pdf
322 Use of Speech Impairment Severity for Dysarthric Speech Recognition ISCA
arXiv
721 MMLung: Moving Closer to Practical Lung Health Estimation using Smartphones GitHub ISCA
Pdf
913 Investigating the Utility of Synthetic Data for Doctor-Patient Conversation Summarization ISCA
2101 Non-Uniform Speaker Disentanglement for Depression Detection from Raw Speech Signals GitHub ISCA
arXiv
753 PoCaPNet: A Novel Approach for Surgical Phase Recognition using Speech and X-Ray Images GitHub ISCA
arXiv
2100 Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in Amyotrophic Lateral Sclerosis ISCA
Pdf
1438 The MASCFLICHT Corpus: Face Mask Type and Coverage Area Recognition from Speech Zenodo ISCA
1435 Towards Reference Speech Characterization for Health Applications GitHub ISCA
2146 Automatic Classification of Hypokinetic and Hyperkinetic Dysarthria based on GMM-Supervectors ISCA
947 Towards Robust Paralinguistic Assessment for Real-World Mobile Health (mHealth) Monitoring: an Initial Study of Reverberation Effects on Speech ISCA
arXiv

Novel Transformer Models for ASR

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2228 Conmer: Streaming Conformer without Self-Attention for Interactive Voice Assistants ISCA
Amazon Science
1255 Intra-Ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition ISCA
1194 A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks GitHub GitHub ISCA
arXiv
1611 HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition ISCA
arXiv
893 Memory-Augmented Conformer for Improved End-To-End Long-form ASR GitHub ISCA
552 Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems ISCA
arXiv

Speaker Recognition

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1294 An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification GitHub ISCA
arXiv
1286 A Study on Visualization of Voiceprint Feature ISCA
1083 VoxTube: A Multilingual Speaker Recognition Dataset GitHub Page
GitHub
ISCA
1298 Visualizing Data Augmentation in Deep Speaker Recognition ISCA
arXiv
1565 Ordered and Binary Speaker Embedding ISCA
arXiv
2031 Self-FiLM: Conditioning GANs with Self-Supervised Representations for Bandwidth Extension based Speaker Recognition ISCA
arXiv
1202 Curriculum Learning for Self-Supervised Speaker Verification ISCA
arXiv
1558 Introducing Self-Supervised Phonetic Information for Text-Independent Speaker Verification ISCA
1379 A Teacher-Student Approach for Extracting Informative Speaker Embeddings from Speech Mixtures ISCA
arXiv
1479 Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification ISCA
arXiv

Cross-lingual and Multilingual ASR

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1630 Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition ISCA
1338 UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR ISCA
772 Allophant: Cross-Lingual Phoneme Recognition with Articulatory Attributes GitHub GitHub ISCA
arXiv
97 Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR System ISCA
arXiv
1061 Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-training for Adaptation to Unseen Languages ISCA
arXiv
1444 DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model GitHub ISCA
arXiv

Voice Conversion

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
251 Emotional Voice Conversion with Semi-Supervised Generative Modeling GitHub Page
GitHub
ISCA
817 Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-Shot Speaker Adaptation GitHub Page
GitHub
ISCA
215 S2CD-VC: Self-Heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion GitHub Page ISCA
1508 Flow-VAE VC: End-to-End Flow Framework with Contrastive Loss for Zero-Shot Voice Conversion WEB Page ISCA
1602 Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation GitHub Page ISCA
arXiv
2298 End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions GitHub Page
GitHub
ISCA
arXiv

Pathological Speech Analysis

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2093 Multimodal Assessment of Bulbar Amyotrophic Lateral Sclerosis (ALS) using a Novel Remote Speech Assessment App ISCA
2181 On the use of High Frequency Information for Voice Pathology Classification ISCA
1784 Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora? ISCA
2531 Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features ISCA
1915 Comparison of Acoustic Measures of Dysphonia in Parkinson's Disease and Huntington's Disease: Effect of Sex and Speaking Task ISCA
1734 Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses GitHub ISCA
arXiv
1574 A Pipeline to Evaluate the Effects of Noise on Machine Learning Detection of Laryngeal Cancer GitHub ISCA
2474 ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection ISCA
234 Automated Multiple Sclerosis Screening based on Encoded Speech Representations ISCA
1934 Cross-Lingual Features for Alzheimer's Dementia Detection from Speech ISCA
1653 Careful Whisper - Leveraging Advances in Automatic Speech Recognition for Robust and Interpretable Aphasia Subtype Classification ISCA
1868 Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer ISCA

Multimodal Speech Emotion Recognition

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1832 LanSER: Language-Model Supported Speech Emotion Recognition ISCA
463 Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition ISCA
1591 Emotion Label Encoding using Word Embeddings for Speech Emotion Recognition ISCA
2444 Discrimination of the Different Intents Carried by the Same Text through Integrating Multimodal Information ISCA
510 Meta-Domain Adversarial Contrastive Learning for Alleviating Individual Bias in Self-Sentiment Predictions ISCA
413 SWRR: Feature Map Classifier based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition ISCA

Phonetics, Phonology, and Prosody

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1443 Effects of Meter, Genre and Experience on Pausing, Lengthening and Prosodic Phrasing in German Poetry Reading ISCA
1142 Comparing First Spectral Moment of Australian English /s/ between Straight and Gay Voices using Three Analysis Window Sizes ISCA
2584 Universal Automatic Phonetic Transcription into the International Phonetic Alphabet GitHub ISCA
2134 Voice Twins: Discovering Extremely Similar-Sounding, Unrelated Speakers ISCA
1042 Filling the Population Statistics Gap: Swiss German Reference Data on F0 and Speech Tempo for Forensic Contexts ISCA
1619 Investigating the Syntax-Discourse Interface in the Phonetic Implementation of Discourse Markers ISCA
2214 Evaluation of a Forensic Automatic Speaker Recognition System with Emotional Speech Recordings ISCA
1052 An Outlier Analysis of Vowel Formants from a Corpus Phonetics Pipeline GitHub ISCA
Pdf
340 The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features GitHub ISCA
1880 Beatboxing Kick Drum Kinematics ISCA
536 Effects of Hearing Loss and Amplification on Mandarin Consonant Perception ISCA
2020 An Acoustic Analysis of Fricative Variation in Three Accents of English ISCA
109 Acoustic Cues to Stress Perception in Spanish – a Mismatch Negativity Study ISCA
976 Bulgarian Unstressed Vowel Reduction: Received Views vs Corpus Findings ISCA
1764 An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations ISCA
arXiv
498 Identifying Stable Sections for Formant Frequency Extraction of French Nasal Vowels based on Difference Thresholds ISCA
1903 Evaluation of Delexicalization Methods for Research on Emotional Speech ISCA
1772 Nonbinary American English Speakers Encode Gender in Vowel Acoustics ISCA
44 Coarticulation of Sibe Vowels and Dorsal Fricatives in Spontaneous Speech: An Acoustic Study ISCA
1013 Using Speech Synthesis to Explain Automatic Speaker Recognition: A New Application of Synthetic Speech ISCA
2534 Same F0, Different Tones: A Multidimensional Investigation of Zhangzhou Tones ISCA
1985 Discovering Phonetic Feature Event Patterns in Transformer Embeddings ISCA
2204 A System for Generating Voice Source Signals that Implements the Transformed LF-Model Parameter Control ISCA
2352 Speaker-Independent Speech Inversion for Estimation of Nasalance ISCA
arXiv
1359 Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect ISCA
arXiv
2187 Durational and Non-Durational Correlates of Lexical and Derived Geminates in Arabic ISCA
68 Mapping Phonemes to Acoustic Symbols and Codes using Synchrony in Speech Modulation Vectors Estimated by the Travellingwave Filter Bank ISCA
1480 Rhythmic Characteristics of L2 German Speech by Advanced Chinese Learners ISCA
1538 (Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-Prosodic Features ISCA
995 Vowel Reduction by Greek-Speaking Children: The Effect of Stress and Word Length ISCA
1822 Pitch Distributions in a Very Large Corpus of Spontaneous Finnish Speech ISCA
828 Speech Enhancement Patterns in Human-Robot Interaction: A Cross-Linguistic Perspective WEB Page ISCA

Speech Coding: Privacy

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1026 Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health GitHub ISCA
arXiv
727 eSTImate: A Real-Time Speech Transmission Index Estimator with Speech Enhancement Auxiliary Task using Self-Attention Feature Pyramid Network ISCA
815 Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement ISCA
arXiv
2138 Privacy-Preserving Representation Learning for Speech Understanding ISCA
448 Vocoder Drift in X-Vector–based Speaker Anonymization GitHub ISCA
arXiv
703 Malafide: A Novel Adversarial Convolutive Noise Attack Against Deepfake and Spoofing Detection Systems ISCA
arXiv

Analysis of Neural Speech Representations

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1087 Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? GitHub Page ISCA
arXiv
383 An Extension of Disentanglement Metrics and its Application to Voice ISCA
2131 An Information-Theoretic Analysis of Self-Supervised Discrete Representations of Speech GitHub ISCA
arXiv
1823 SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? GitHub ISCA
arXiv
1418 Comparison of GIF- and SSL-based Features in Pathological Voice Detection ISCA
1617 What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions GitHub ISCA

End-to-end ASR

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1640 End-to-End Joint Target and Non-Target Speakers ASR ISCA
arXiv
144 Improving Frame-Level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition ISCA
arXiv
564 Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-Level Timestamp Prediction ISCA
101 Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition GitHub ISCA
arXiv
142 Multi-Pass Training and Cross-Information Fusion for Low-Resource End-to-End Accented Speech Recognition ISCA
arXiv
906 Text-Only Domain Adaptation for End-to-End ASR using Integrated Text-to-Mel-Spectrogram Generator ISCA
arXiv

Spoken Language Understanding, Summarization, and Information Retrieval

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
461 Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling GitHub Page
GitHub
ISCA
277 Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models GitHub ISCA
1307 Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization ISCA
arXiv
1136 Audio Retrieval with WavText5K and CLAP Training GitHub ISCA
arXiv
242 Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding GitHub ISCA
arXiv
1652 Contrastive Disentangled Learning for Memory-Augmented Transformer ISCA

Invariant and Robust Pre-trained Acoustic Models

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
438 ProsAudit, a Prosodic Benchmark for Self-Supervised Speech Models ISCA
arXiv
871 Self-Supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces ISCA
arXiv
1862 Evaluating Context-Invariance in Unsupervised Speech Representations GitHub ISCA
arXiv
1390 CoBERT: Self-Supervised Speech Representation Learning through Code Representation Learning GitHub ISCA
arXiv
847 Self-Supervised Fine-tuning for Improved Content Representations by Speaker-Invariant Clustering GitHub ISCA
arXiv
359 Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder ISCA

Speech Synthesis: Representation Learning

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1571 Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech GitHub Page ISCA
2313 Adapter-based Extension of Multi-Speaker Text-To-Speech Model for New Speakers GitHub Page ISCA
arXiv
2574 SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis ISCA
2326 UnitSpeech: Speaker-Adaptive Speech Synthesis with Untranscribed Data GitHub Page
GitHub
ISCA
arXiv
677 LightVoc: an Upsampling-Free GAN Vocoder based on Conformer and Inverse Short-time Fourier Transform GitHub Page ISCA
1095 ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings GitHub Page ISCA
arXiv

Speech Perception, Production, and Acquisition

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1330 Human Transcription Quality Improvement GitHub
GitHub
ISCA
Amazon Science
1604 The Effect of Masking Noise on Listeners' Spectral Tilt Preferences ISCA
1967 The Effect of Whistled Vowels on Whistled Word Categorization for Naive Listeners ISCA
1481 Automatic Deep Neural Network-based Segmental Pronunciation Error Detection of L2 English Speech (L1 Bengali) ISCA
1662 The Effect of Stress on Mandarin Tonal Perception in Continuous Speech for Spanish-Speaking Learners ISCA
1918 Combining Acoustic and Aerodynamic Data Collection: A Perceptual Evaluation of Acoustic Distortions ISCA
953 Estimating Virtual Targets for Lingual Stop Consonants using General Tau Theory ISCA
1931 Using Random Forests to Classify Language as a Function of Syllable Timing in Two Groups: Children with Cochlear Implants and with Normal Hearing ISCA
2256 An Improved End-to-End Audio-Visual Speech Recognition Model ISCA
1954 What Influences the Foreign Accent Strength? Phonological and Grammatical Errors in the Perception of Accentedness WEB Page ISCA
2077 Investigating the Perception Production Link through Perceptual Adaptation and Phonetic Convergence ISCA
1385 Emotion Prompting for Speech Emotion Recognition ISCA
1196 Speech-in-Speech Recognition is Modulated by Familiarity to Dialect ISCA
673 BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-Talker Conditions GitHub ISCA
arXiv
2046 Are Retroflex-to-Dental Sibilant Substitutions in Polish Children's Speech an Example of a Covert Contrast? A Preliminary Acoustic Study ISCA
1123 First Language Effects on Second Language Perception: Evidence from English Low-Vowel Nasal Sequences Perceived by L1 Mandarin Chinese Listeners ISCA
2247 Motor Control Similarity between Speakers Saying "a Souk" using Inverse Atlas Tongue Modeling ISCA
910 Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models ISCA
arXiv
317 A Relationship between Vocal Fold Vibration and Droplet Production ISCA
803 Audio, Visual and Audiovisual Intelligibility of Vowels Produced in Noise ISCA
172 Optimal Control of Speech with Context-Dependent Articulatory Targets ISCA
593 Computational Modeling of Auditory Brainstem Responses Derived from Modified Speech ISCA
1732 Leveraging Label Information for Multimodal Emotion Recognition GitHub ISCA
1465 Improving End-to-End Modeling for Mandarin-English Code-Switching using Lightweight Switch-Routing Mixture-of-Experts ISCA
1803 Frequency Patterns of Individual Speaker Characteristics at Higher and Lower Spectral Ranges ISCA
1818 Adaptation to Predictive Prosodic cues in Non-Native Standard Dialect ISCA
1007 Head Movements in Two- and Four-Person Inter-Active Conversational Tasks in Noisy and Moderately Reverberant Conditions ISCA
334 Second Language Identification of Vietnamese Tones by Native Mandarin Learners ISCA
203 Nasal Vowel Production and Grammatical Processing in French-Speaking Children with Cochlear Implants and Normal-Hearing Peers ISCA
412 Emotion Classification with EEG Responses Evoked by Emotional Prosody of Speech ISCA
145 L2-Mandarin Regional Accent Variability During Mandarin Tone-Word Training Facilitates English listeners' Subsequent tone Categorizations ISCA
1680 HumanDiffusion: Diffusion Model using Perceptual Gradients ISCA
arXiv
2087 Queer Events, Relationships, and Sports: Does Topic Influence Speakers' Acoustic Expression of Sexual Orientation? ISCA

Acoustic Model Adaptation for ASR

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
583 Factorised Speaker-Environment Adaptive Training of Conformer Speech Recognition Systems ISCA
arXiv
1349 Text Only Domain Adaptation with Phoneme Guided Data Splicing for End-to-End Speech Recognition GitHub ISCA
arXiv
327 Cross-Lingual Cross-Age Adaptation for Low-Resource Elderly Speech Emotion Recognition GitHub ISCA
arXiv
2215 Modular Domain Adaptation for Conformer-based Streaming ASR ISCA
arXiv
2192 Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters ISCA
arXiv
1282 SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization GitHub ISCA
arXiv

Speech Synthesis: Expressivity

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
858 Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions GitHub ISCA
2242 Dual Audio Encoders based Mandarin Prosodic Boundary Prediction by using Multi-Granularity Prosodic Representations ISCA
645 NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS WEB Page ISCA
arXiv
782 MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy GitHub Page ISCA
arXiv
2469 Narrator or Character: Voice Modulation in an Expressive Multi-Speaker TTS GitHub ISCA
843 CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation ISCA
arXiv
1405 Semi-Supervised Learning for Continuous Emotional Intensity Controllable Speech Synthesis with Disentangled Representations WEB Page ISCA
arXiv
1905 Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis GitHub Page ISCA
1460 ComedicSpeech: Adaptive Text to Speech For Stand-up Comedy in Low-Resource Scenario GitHub Page ISCA
arXiv
1552 Neural Speech Synthesis with Enriched Phrase Boundaries GitHub ISCA
437 Cross-Lingual Prosody Transfer for Expressive Machine Dubbing ISCA
arXiv
2178 Synthesis after a couple PINTs: Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception GitHub ISCA
433 Accentor: An Explicit Lexical Stress Model for TTS Systems ISCA
Pdf
1032 A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers WEB Page ISCA
715 Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model GitHub Page ISCA
arXiv
289 Prosody Modeling with 3D Visual Information for Expressive Video Dubbing ISCA
1528 LightClone: Speaker-Guided Parallel Subnet Selection for Few-Shot Voice Cloning GitHub Page ISCA
1671 EE-TTS: Emphatic Expressive TTS with Linguistic Information GitHub Page ISCA
arXiv
1673 Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS ISCA
arXiv
122 ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading GitHub Page ISCA
arXiv
1779 PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions GitHub Page ISCA
arXiv
1639 Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models GitHub Page ISCA
arXiv
2453 A Generative Framework for Conversational Laughter: Its "Language Model" and Laughter Sound Synthesis ISCA
arXiv
1754 Towards Spontaneous Style Modeling with Semi-Supervised Pre-training for Conversational Text-to-Speech Synthesis GitHub Page ISCA
2072 Beyond Style: Synthesizing Speech with Pragmatic Functions WEB Page ISCA
965 eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer ISCA
arXiv

Multi-modal Systems

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1146 BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion GitHub Page ISCA
arXiv
370 Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech based on Metric Learning ISCA
arXiv
989 Whistle-to-Text: Automatic Recognition of the Silbo Gomero Whistled Language ISCA
663 A Novel Interpretable and Generalizable Re-Synchronization Model for Cued Speech based on a Multi-Cuer Corpus GitHub ISCA
arXiv
668 Visually Grounded Few-Shot Word Acquisition with Fewer Shots ISCA
arXiv
183 JAMFN: Joint Attention Multi-Scale Fusion Network for Depression Detection ISCA

Question Answering from Speech

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1485 Prompt Guided Copy Mechanism for Conversational Question Answering ISCA
1240 Composing Spoken Hints for Follow-on Question Suggestion in Voice Assistants ISCA
1391 On Monotonic Aggregation for Open-Domain QA GitHub ISCA
2240 Question-Context Alignment and Answer-Context Dependencies for Effective Answer Sentence Selection ISCA
arXiv
1606 Multi-Scale Attention for Audio Question Answering GitHub ISCA
arXiv
539 Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers ISCA

Multi-talker Methods in Speech Processing

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1749 SEF-Net: Speaker Embedding Free Target Spekaer Extraction Network ISCA
1530 Overlap aware Continuous Speech Separation without Permutation Invariant Training Linfeng ISCA
1952 Cascaded Encoders for Fine-Tuning ASR Models on Overlapped Speech ISCA
arXiv
2069 TokenSplit: using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition ISCA
1422 Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator ISCA
arXiv
2098 Time-Domain Transformer-based Audiovisual Speaker Separation ISCA
628 Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization ISCA
arXiv
1502 Unsupervised Adaptation with Quality-aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization ISCA
1521 BA-SOT: Boundary-aware Serialized Output Training for Multi-Talker ASR ISCA
arXiv
1172 Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation ISCA
975 Joint Compensation of Multi-Talker Noise and Reverberation for Speech Enhancement with Cochlear Implants using One or More Microphones ISCA
494 Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach ISCA
42 GPU-accelerated Guided Source Separation for Meeting Transcription GitHub ISCA
arXiv
1280 Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition GitHub Page ISCA
arXiv
2076 Directional Speech Recognition for Speaker Disambiguation and Cross-talk Suppression ISCA
1815 Mixture Encoder for Joint Speech Separation and Recognition ISCA
arXiv

Sociophonetics

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
206 Aberystwyth English Pre-Aspiration in Apparent Time ISCA
1154 Speech Entrainment in Chinese Story-Style Talk Shows: The Interaction Between Gender and Role ISCA
1414 Sociodemographic and Attitudinal Effects on Dialect Speakers' Articulation of the Standard Language: Evidence from German-Speaking Switzerland ISCA
1704 Vowel Normalisation in Latent Space for Sociolinguistics ISCA

Speaker and Language Diarization

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1228 Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor ISCA
arXiv
1447 Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism ISCA
2367 The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments GitHub Page ISCA
arXiv
1982 Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction ISCA
arXiv
1839 The SpeeD-ZevoTech Submission at DISPLACE 2023 ISCA
656 End-to-End Neural Speaker Diarization with Absolute Speaker Loss ISCA

Anti-Spoofing for Speaker Verification

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1402 Towards Single Integrated Spoofing-aware Speaker Verification Embeddings GitHub ISCA
arXiv
1352 Pseudo-Siamese Network based Timbre-Reserved Black-Box Adversarial Attack in Speaker Identification ISCA
arXiv
2335 Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion GitHub ISCA
arXiv
1166 Robust Audio Anti-Spoofing Countermeasure with Joint Training of Front-end and Back-end and Models ISCA
1537 Improved DeepFake Detection using Whisper Features GitHub ISCA
arXiv
371 DoubleDeceiver: Deceiving the Speaker Verification System Protected by Spoofing Countermeasures ISCA

Speech Coding: Intelligibility

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2209 On Training a Neural Residual Acoustic echo Suppressor for Improved ASR ISCA
1429 Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation GitHub Page ISCA
arXiv
378 UnSE: Unsupervised Speech Enhancement using Optimal Transport GitHub Page ISCA
1130 MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation GitHub Page ISCA
arXiv
2177 Causal Signal-based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement ISCA
1511 Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction ISCA
arXiv

New Computational Strategies for ASR Training and Inference

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2183 A Metric-Driven Approach to Conformer Layer Pruning for Efficient ASR Inference ISCA
1981 Distillation Strategies for Discriminative Speech Recognition Rescoring ISCA
arXiv
969 Another Point of View on Visual Speech Recognition ISCA
1062 RASR2: The RWTH ASR Toolkit for Generic Sequence-to-Sequence Speech Recognition GitHub Page ISCA
arXiv
486 Streaming Speech-to-Confusion Network Speech Recognition ISCA
arXiv
809 Accurate and Structured Pruning for Efficient Automatic Speech Recognition ISCA
arXiv

MERLIon CCS Challenge: Multilingual Everyday Recordings - Language Identification On Code-Switched Child-Directed Speech

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1446 MERLIon CCS Challenge: A English-Mandarin Code-Switching Child-directed Speech Corpus for Language Identification and Diarization GitHub ISCA
arXiv
1335 Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech GitHub ISCA
arXiv
1707 Investigating Model Performance in Language Identification: beyond Simple Error Statistics ISCA
arXiv
2533 Improving Wav2vec2-based Spoken Language Identification by Learning Phonological Features ISCA
2047 Language Identification Networks for Multilingual Everyday Recordings ISCA

Health-Related Speech Analysis

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2038 Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings ISCA
1668 The Effect of Clinical Intervention on the Speech of Individuals with PTSD: Features and Recognition Performances ISCA
470 Analysis and Automatic Prediction of Exertion from Speech: Contrasting Objective and Subjective Measures Collected while Running ISCA
894 The Androids Corpus: A New Publicly Available Benchmark for Speech based Depression Detection GitHub ISCA
658 Comparing Hand-Crafted Features to Spectrograms for Autism Severity Estimation ISCA
839 Acoustic Characteristics of Depression in Older Adults' Speech: the Role of Covariates ISCA

Automatic Audio Classification and Audio Captioning

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
943 Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning ISCA
arXiv
1564 Adapting a ConvNeXt Model to Audio Classification on AudioSet GitHub ISCA
arXiv
1610 Few-Shot Class-Incremental Audio Classification using Stochastic Classifier GitHub ISCA
arXiv
1614 Enhance Temporal Relations in Audio Captioning with Sound Event Detection ISCA
arXiv

Speech Synthesis

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
407 Epoch-based Spectrum Estimation for Speech GitHub Page
GitHub
ISCA
1996 OverFlow: Putting Flows on Top of Neural Transducers for Better TTS GitHub Page
GitHub
ISCA
arXiv
1568 AdapterMix: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation GitHub ISCA
arXiv
506 Prior-Free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis ISCA
367 UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model ISCA
arXiv
1301 Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech GitHub Page ISCA
1151 Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge GitHub Page ISCA
arXiv
879 Towards Robust FastSpeech 2 by Modelling Residual Multimodality GitHub Page ISCA
arXiv
1137 Real Time Spectrogram Inversion on Mobile Phone GitHub Page ISCA
arXiv
58 Automatic Tuning of Loss Trade-offs without Hyper-Parameter Search in End-to-End Zero-Shot Speech Synthesis GitHub Page
GitHub
ISCA
arXiv
2056 A Low-Resource Pipeline for Text-to-Speech from Found Data With Application to Scottish Gaelic GitHub Page ISCA
2173 Self-Supervised Solution to the Control Problem of Articulatory Synthesis GitHub Page ISCA
1128 Hierarchical Timbre-Cadence Speaker Encoder for Zero-Shot Speech Synthesis GitHub Page ISCA
754 ZET-Speech: Zero-Shot adaptive Emotion-Controllable Text-to-Speech Synthesis with Diffusion and Style-based Models GitHub Page ISCA
arXiv
690 Improving WaveRNN with Heuristic Dynamic Blending for Fast and High-Quality GPU Vocoding GitHub Page ISCA
194 Intelligible Lip-to-Speech Synthesis with Speech Units GitHub Page
GitHub
ISCA
arXiv
1212 Parameter-Efficient Learning for Text-to-Speech Accent Adaptation GitHub Page
GitHub
GitHub
ISCA
arXiv
820 Controlling Formant Frequencies with Neural Text-to-Speech for the Manipulation of Perceived Speaker Age GitHub ISCA
2379 FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder with Multiple STFTs GitHub Page ISCA
arXiv
1726 iSTFTNet2: Faster and more Lightweight iSTFT-based Neural Vocoder using 1D-2D CNN WEB Page ISCA
534 VITS2: Improving Quality and Efficiency of Single Stage Text to Speech with Adversarial Learning and Architecture Design GitHub Page ISCA
1175 Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme ISCA

Speech Synthesis: Controllability and Adaptation

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1608 HierVST: Hierarchical Adaptive Zero-Shot Voice Style Transfer GitHub Page ISCA
391 VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer GitHub Page
GitHub
ISCA
arXiv
700 EdenTTS: A Simple and Efficient Parallel Text-to-Speech Architecture with Collaborative Duration-Alignment Learning GitHub Page
GitHub
ISCA
368 Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations GitHub Page ISCA
1020 Speech Inpainting: Context-based Speech Synthesis Guided by Video GitHub Page ISCA
arXiv
2243 STEN-TTS: Improving Zero-Shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework ISCA

Search Methods and Decoding Algorithms for ASR

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
933 Average Token Delay: A Latency Metric for Simultaneous Translation ISCA
arXiv
1450 Automatic Speech Recognition Transformer with Global Contextual Information Decoder ISCA
1333 Time-Synchronous One-Pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training ISCA
2065 Prefix Search Decoding for RNN Transducers ISCA
78 WhisperX: Time-Accurate Speech Transcription of Long-Form Audio GitHub ISCA
arXiv
2449 Implementing Contextual Biasing in GPU Decoder for Online ASR GitHub ISCA
arXiv

Speech Signal Analysis

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2487 MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-Level Feature Fusion GitHub ISCA
arXiv
2211 Enhancing Speech Articulation Analysis using A Geometric Transformation of the X-ray Microbeam Dataset ISCA
arXiv
1729 Matching Acoustic and Perceptual Measures of Phonation Assessment in Disordered Speech - A Case Study ISCA
283 Improved Contextualized Speech Representations for Tonal Analysis ISCA
1738 A Study on the Importance of Formant Transitions for Stop-Consonant Classification in VCV Sequence ISCA
idiap
2229 FusedF0: Improving DNN-based F0 Estimation by Fusion of Summary-Correlograms and Raw Waveform Representations of Speech Signals ISCA
Pdf

Connecting Speech-science and Speech-technology for Children's Speech

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
928 Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report ISCA
907 Uncertainty Estimation for Connectionist Temporal Classification based Automatic Speech Recognition ISCA
Pdf
2185 Speech Breathing Behavior During Pauses in Children ISCA
926 Exploiting Diversity of Automatic Transcripts from Distinct Speech Recognition Techniques for Children's Speech ISCA
Pdf
1924 Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders ISCA
arXiv
978 BabySLM: Language-Acquisition-Friendly Benchmark of Self-Supervised Spoken Language Models GitHub ISCA
arXiv
702 Data Augmentation for Children ASR and Child-adult Speaker Classification using Voice Conversion Methods GitHub ISCA
2236 Developmental Articulatory and Acoustic Features for Six to Ten Year Old Children ISCA
2251 Automatically Predicting Perceived Conversation Quality in a Pediatric Sample Enriched for Autism ISCA
1257 An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities ISCA
743 An Analysis of Goodness of Pronunciation for Child Speech GitHub ISCA
1569 Measuring Language Development from Child-centered Recordings GitHub ISCA
2057 Speaking Clearly, Understanding Better: Predicting the L2 Narrative Comprehension of Chinese Bilingual Kindergarten Children based on Speech Intelligibility using a Machine Learning Approach ISCA
312 Classifying Rhoticity of /r/ in Speech Sound Disorder using Age-and-Sex Normalized Formants ISCA
arXiv
1273 Understanding Spoken Language Development of Children with ASD using Pre-trained Speech Embeddings ISCA
arXiv
2099 Measuring Phonological Precision in Children with Cleft Lip and Palate GitHub ISCA
937 A Study on Using Duration and Formant Features in Automatic Detection of Speech Sound Disorder in Children ISCA
1873 Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate GitHub Page ISCA
1882 Prospective Validation of Motor-based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders ISCA
arXiv

Dialog Management

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2238 Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning ISCA
arXiv
2525 An Autoregressive Conversational Dynamics Model for Dialogue Systems ISCA
1983 Style-Transfer based Speech and Audio-Visual Scene Understanding for Robot Action Sequence Acquisition from Videos ISCA
arXiv
1037 Speech aware Dialog System Technology Challenge (DSTC11) WEB Page ISCA
arXiv
1397 Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision GitHub ISCA
arXiv
2513 Tracking Must Go On: Dialogue State Tracking with Verified Self-Training ISCA

Speech Activity Detection and Modeling

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
558 GL-SSD: Global and Local Speech Style Disentanglement by Vector Quantization for Robust Sentence Boundary Detection in Speech Stream ISCA
598 Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction ISCA
arXiv
2466 Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions ISCA
996 Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets ISCA
arXiv
716 Real-Time Causal Spectro-Temporal Voice Activity Detection based on Convolutional Encoding and Residual Decoding ISCA
2413 SVVAD: Personal Voice Activity Detection for Speaker Verification ISCA
arXiv

Multilingual Models for ASR

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1613 Learning Cross-Lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition ISCA
arXiv
2122 AfriNames: Most ASR models "butcher" African Names Hugging Face ISCA
arXiv
2528 Towards Dialect-Inclusive Recognition in a Low-Resource Language: are Balanced Corpora the Answer? ISCA
2588 Svarah: Evaluating English ASR Systems on Indian Accents GitHub ISCA
arXiv
1044 N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition ISCA
arXiv
1014 The MALACH Corpus: Results with End-to-End Architectures and Pretraining ISCA

Speech Enhancement and Bandwidth Expansion

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
232 Unsupervised Speech Enhancement with Deep Dynamical Generative Speech and Noise Models ISCA
arXiv
857 Noise-Robust Bandwidth Expansion for 8K Speech Recordings ISCA
113 mdctGAN: Taming Transformer-based GAN for Speech Super-Resolution with Modified DCT Spectra GitHub ISCA
arXiv
625 Zoneformer: On-Device Neural Beamformer for In-Car Multi-Zone Speech Separation, Enhancement and echo Cancellation GitHub Page ISCA
634 Low-Complexity Broadband Beampattern Synthesis using Array Response Control ISCA
904 A GAN Speech Inpainting Model for Audio Editing Software GitHub ISCA

Articulation

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2316 Deep Speech Synthesis from MRI-based Articulatory Representations GitHub ISCA
arXiv
562 Learning to Compute the Articulatory Representations of Speech with the MIRRORNET GitHub Page
GitHub
ISCA
arXiv
804 Generating High-Resolution 3D Real-Time MRI of the Vocal Tract GitHub ISCA
1593 Exploring a Classification Approach using Quantised Articulatory Movements for Acoustic to Articulatory Inversion ISCA

Neural Processing of Speech and Language: Encoding and Decoding the Diverse Auditory Brain

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
633 Coherence Estimation Tracks Auditory Attention in Listeners with Hearing Impairment ISCA
2378 Enhancing the EEG Speech Match Mismatch Tasks with Word Boundaries GitHub ISCA
arXiv
1347 Similar Hierarchical Representation of Speech and Other Complex Sounds in the Brain and Deep Residual Networks: an MEG Study ISCA
121 Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? ISCA
HAL Science
282 MEG Encoding using Word Context Semantics in Listening Stories ISCA
HAL Science
1949 Investigating the Cortical Tracking of Speech and Music with Sung Speech ISCA
414 Exploring Auditory Attention Decoding using Speaker Features ISCA
1776 Effects of Spectral Degradation on the Cortical Tracking of the Speech Envelope ISCA
964 Effects of Spectral and Temporal Modulation Degradation on Intelligibility and Cortical Tracking of Speech Signals ISCA

Perception of Paralinguistics

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2061 Transfer Learning for Personality Perception via Speech Emotion Recognition ISCA
arXiv
1131 A Stimulus-Organism-Response Model of Willingness to Buy from Advertising Speech using Voice Quality WEB Page ISCA
1835 Voice Passing: A Non-Binary Voice Gender Prediction System for evaluating Transgender ISCA
1139 Influence of Personal Traits on Impressions of One's Own Voice ISCA
887 Pardon my Disfluency: The Impact of Disfluency Effects on the Perception of Speaker Competence and Confidence ISCA
711 Cross-Linguistic Emotion Perception in Human and TTS Voices WEB Page ISCA

Technologies for Child Speech Processing

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
1302 Joint Learning Feature and Model Adaptation for Unsupervised Acoustic Modelling of Child Speech ISCA
1681 Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics GitHub
GitHub
ISCA
arXiv
2084 An ASR-enabled Reading Tutor: Investigating Feedback to Optimize Interaction for Learning to Read ISCA
Pdf
935 Adaptation of Whisper Models to Child Speech Recognition GitHub
Hugging Face
ISCA

Speech Synthesis: Multilinguality; Evaluation

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2064 Automatic Evaluation of Turn-Taking Cues in Conversational Speech Synthesis GitHub Page ISCA
arXiv
441 Expressive Machine Dubbing through Phrase-Level Cross-Lingual Prosody Transfer ISCA
arXiv
1691 Robust Feature Decoupling in Voice Conversion by using Locality-based Instance Normalization GitHub ISCA
612 Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network ISCA
2148 The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech GitHub Page ISCA
arXiv
1727 GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech GitHub Page ISCA
arXiv
1285 Analysis of Mean Opinion Scores in Subjective Evaluation of Synthetic Speech based on Tail Probabilities GitHub ISCA
1584 LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus GitHub Page
Openslr
ISCA
arXiv
1067 UniFLG: Unified Facial Landmark Generator from Text or Speech GitHub Page ISCA
arXiv
444 XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech GitHub ISCA
arXiv
2224 ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus ClArTTS ISCA
arXiv
154 Diffusion-based Accent Modelling in Speech Synthesis ISCA
249 Multilingual Text-to-Speech Synthesis for Turkic Languages using Transliteration GitHub ISCA
arXiv
553 CVTE-Poly: A New Benchmark for Chinese Polyphone Disambiguation GitHub ISCA
709 Improve Bilingual TTS using Language and Phonology Embedding with Embedding Strength Modulator GitHub Page ISCA
arXiv
2179 High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units GitHub Page
GitHub
ISCA
arXiv
1097 PronScribe: Highly Accurate Multimodal Phonemic Transcription From Speech and Text ISCA
2158 Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages GitHub Page ISCA
arXiv
416 Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously GitHub ISCA
arXiv
1622 Speaker-Independent Neural Formant Synthesis GitHub Page ISCA
arXiv
1098 CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center GitHub Page ISCA
arXiv
430 SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion GitHub Page ISCA

Show and Tell: Health Applications and Emotion Recognition

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2618 A Personalised Speech Communication Application for Dysarthric Speakers ISCA
2624 Video Multimodal Emotion Recognition System for Real World Applications ISCA
2626 Promoting Mental Self-Disclosure in a Spoken Dialogue System ISCA
2632 "Select Language, Modality or Put on a Mask!" Experiments with Multimodal Emotion Recognition ISCA
2635 My Vowels Matter: Formant Automation Tools for Diverse Child Speech ISCA
2636 NEMA: An Ecologically Valid Tool for Assessing Hearing Devices, Advanced Algorithms, and Communication in Diverse Listening Environments ISCA
2644 When Words Speak Just as Loudly as Actions: Virtual Agent based Remote Health Assessment Integrating What Patients Say with What They Do ISCA
Pdf
2648 Stuttering Detection Application ISCA
2649 Providing Interpretable Insights for Neurological Speech and Cognitive Disorders from Interactive Serious Games ISCA
2651 Automated Neural Nursing Assistant (ANNA): An Over-the-Phone System for Cognitive Monitoring ISCA
2656 5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids WEB Page ISCA
2671 Towards Two-Point Neuron-Inspired Energy-Efficient Multimodal Open Master Hearing aid ISCA

Show and Tell: Speech Tools, Speech Enhancement, Speech Synthesis

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2614 DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement GitHub ISCA
arXiv
2615 Nkululeko: Machine Learning Experiments on Speaker Characteristics without Programming GitHub ISCA
2625 Sp1NY: A Quick and Flexible Python Speech Visualization Tool ISCA
2629 Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0 ISCA
2634 So-to-Speak: an Exploratory Platform for Investigating the Interplay between Style and Prosody in TTS GitHub ISCA
2638 Comparing /b/ and /d/ with a Single Physical Model of the Human Vocal Tract to Visualize Droplets Produced while Speaking ISCA
2640 Show & Tell: Voice Activity Projection and Turn-taking GitHub ISCA
2652 Real-Time Detection of Soft Voice for Speech Enhancement ISCA
2655 Data Augmentation for Diverse Voice Conversion in Noisy Environments ISCA
arXiv
2667 Application for Real-Time Audio-Visual Speech Enhancement ISCA

Show and Tell: Language Learning and Educational Resources

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2623 A Unified Framework to Improve Learners' Skills of Perception and Production based on Speech Shadowing and Overlapping ISCA
2633 Speak & Improve: L2 English Speaking Practice Tool ISCA
2641 Measuring Prosody in Child Speech using SoapBox Fluency API ISCA
2650 Teaching Non-native Sound Contrasts using Visual Biofeedback ISCA
2654 Large-Scale Automatic Audiobook Creation ISCA
2658 QVoice: Arabic Speech Pronunciation Learning Application ISCA
arXiv
2659 Asking Questions: an Innovative Way to Interact with Oral History Archives ISCA
2660 DisfluencyFixer: A Tool to Enhance Language Learning through Speech to Speech Disfluency Correction React ISCA
arXiv
2661 Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages ISCA
arXiv
2668 MyVoice: Arabic Speech Resource Collaboration Platform ISCA
2669 Personal Primer Prototype 1: Invitation to Make Your Own Embooked Speech-based Educational Artifact GitHub ISCA
ResearchGate

Show and Tell: Media and Commercial Applications

Section Papers Preprint Papers Papers with Open Code

🆔 Title Repo Paper
2621 Let's Give a Voice to Conversational Agents in Virtual Reality GitHub ISCA
2622 FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator ISCA
arXiv
2637 Video Summarization Leveraging Multimodal Information for Presentations ISCA
2645 What Questions are My Customers Asking?: Towards Actionable Insights from Customer Questions in Contact Center Calls ISCA
2646 COnVoy: A Contact Center Operated Pipeline for Voice of Customer Discovery ISCA
2653 NeMo Forced Aligner and its Application to Word Alignment for Subtitle Generation ISCA
2662 CauSE: Causal Search Engine for Understanding Contact-Center Conversations ISCA
2663 Tailored Real-Time Call Summarization System for Contact Centers ISCA
2647 Federated Learning Toolkit with Voice-based User Verification Demo ISCA
2657 Learning when to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models GitHub ISCA
arXiv
2628 Fast Enrollable Streaming Keyword Spotting System: Training and Inference using a Web Browser ISCA
2665 Cross-Lingual/Cross-Channel Intent Detection in Contact-Center Conversations ISCA

Key Terms

Key Terms


Star History

Star History Chart