I will join the Department of Software and Computer Engineering at Ajou University as an Assistant Professor in Mar. 2024 (SAIL, Speech AI Lab.). I'm currently a postdoctoral researcher in AI Research Center, Korea University, Seoul, South Korea. I received the Ph.D. degree in the Department of Brain and Cognitive Engineering, Korea University in 2023. In March 2016, I started my integrated M.S.&Ph.D. in Pattern Recognition & Machine Learning (PRML) Lab at the Korea University in Seoul, Korea, under the supervision of Seong-Whan Lee.
- E-mail: [email protected], [email protected]
- Google Scholar: Link
- Speech AI Lab. (SAIL): Link
- PRML Speech Team (Supervisor: Seong-Whan Lee): Link
- Speech Synthesis (2019~, HierSpeech++, DDDM-VC)
- Neural Vocoder (2021~, PeriodWave, PeriodWave-Turbo, Fre-GAN, Fre-GAN2)
- Audio Generation (2023~, DDDM-Mixer)
- Singing Voice Synthesis (2022~, MIDI-Voice, HiddenSinger)
- Speech-to-Speech Translation (2023~, TranSentence)
- Brain-Computer Interface (2019~2020, Brain-to-Speech System)
- Reinforcement Learning (2017~2018, AI Curling Robot Curly)
- 2024.09: One paper has been accepted to Neural Networks (HiddenSinger). This project was funded by Netmarble AI Center, Netmarble Corp. in 2022.
- 2024.04: One paper has been accepted to TASLP (DiffProsody)
- 2024.01: One paper has been accepted to ICASSPW 2024 (LIMMITS'24, ICASSP SP Grand Challenges)
- 2023.12: One paper has been accepted to TASLP (Fre-Painter)
- 2023.12: Two papers have been accepted to ICASSP 2024 (TranSentence, MIDI-Voice)
- 2023.12: One paper has been accepted to AAAI 2024 (DDDM-VC)
- 2023.11: We release HierSpeech++, Zero-shot Speech Synthesis models for Zero-shot TTS, Zero-shot VC, and Speech Super-resolution. [Demo] [Code] [Gradio]
- Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization, S.-H. Lee, H.-Y. Choi, and S.-W. Lee, 2024. (Under Review) [Code]
- PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation, S.-H. Lee, H.-Y. Choi, and S.-W. Lee, 2024. (Under Review) [Code]
- HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation by Hierarchical Variational Inference for Zero-shot Speech Synthesis, S.-H. Lee, H.-Y. Choi, S.-B. Kim, and S.-W. Lee, 2023. (Under Review) [Demo] [Code] [Gradio]
- DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel Generation, H.-S. Oh, S.-H. Lee, D.-H. Cho, and S.-W. Lee, 2024. (Under Review) [Demo] [Code]
- HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models, J.-S. Hwang, S.-H. Lee, and S.-W. Lee, Neural Networks, 2024 [Demo]
- DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training, H.-S. Oh, S.-H. Lee, and S.-W. Lee, TASLP, 2024 [Demo] [Code]
- Cross-lingual Text-to-Speech via Hierarchical Style Transfer, S.-H. Lee, H.-Y. Choi, and S.-W. Lee, ICASSPW, 2024.
- Audio Super-resolution with Robust Speech Representation Learning of Masked Autoencoder, S.-B. Kim, S.-H. Lee, H.-Y. Choi, S.-W. Lee, TASLP, 2024.
- TranSentence: Speech-to-Speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data, S.-B. Kim, S.-H. Lee, and S.-W. Lee, ICASSP, 2024.
- MIDI-Voice: Expressive Zero-shot Singing Voice Synthesis via MIDI-driven Priors, D.-M. Byun, S.-H. Lee, J.-S. Hwang, and S.-W. Lee, ICASSP, 2024.
- DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion, H.-Y. Choi*, S.-H. Lee*, and S.-W. Lee, AAAI, 2024. [Demo] [Code] [Poster]
- HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer, S.-H. Lee*, H.-Y. Choi*, H.-S. Oh, and S.-W. Lee, Interspeech, 2023. (Oral) [Arxiv] [Demo]
- Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation, H.-Y. Choi, S.-H. Lee, and S.-W. Lee, Interspeech, 2023. (Oral) [Demo] [Code]
- PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling, J.-S. Hwang, S.-H. Lee, and S.-W. Lee, ACPR, 2023. [Demo]
- HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis, S.-H. Lee, S.-B. Kim, J.-H. Lee, E. Song, M.-J. Hwang, and S.-W. Lee, NeurIPS, 2022. [OpenReview] [Demo] [Poster]
- Duration Controllable Voice Conversion via Phoneme-based Information Bottleneck, S.-H. Lee, H.-R. Noh, W. Nam, and S.-W. Lee, IEEE TASLP, 2022. (2022-JCR-IF: 5.4, JIF PERCENTILE TOP 8.10%)
- StyleVC: Non-Parallel Voice Conversion with Adversarial Style Generalization, I. Hwang, S.-H. Lee, and S.-W. Lee, ICPR, 2022. [Demo] [Code]
- Fre-GAN 2: Fast and Efficient Frequency-consistent Audio Synthesis, S.-H. Lee, J.-H. Kim, G.-E. Lee, and S.-W. Lee, ICASSP, 2022. [Demo] [Code]
- PVAE-TTS: Progressively Style Adaptive Text-to-Speech via Progressive Variaional Autoencoder, J.-H. Lee, S.-H. Lee, J.-H. Kim, and S.-W. Lee, ICASSP, 2022. [Demo]
- EmoQ-TTS: Emotion Intensity Quantization for Fine-Grained Controllable Emotional Text-to-Speech, C.-B. Im, S.-H. Lee, and S.-W. Lee, ICASSP, 2022. [Demo]
- VoiceMixer: Adversarial Voice Style Mixup, S.-H. Lee, J.-H. Kim, H. Chung, and S.-W. Lee, NeurIPS, 2021. [Demo]
- Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis, S.-H. Lee, H.-W. Yoon, H.-R. Noh, J.-H. Kim, and S.-W. Lee, AAAI, 2021. [Demo]
- GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints, J.-H. Kim, S.-H. Lee, J.-H. Lee, H.-G. Jung, and S.-W. Lee, SMC, 2021.
- Fre-GAN: Adversarial Frequency-consistent Audio Synthesis, J.-H. Kim, S.-H. Lee, J.-H. Lee, and S.-W. Lee, Interspeech, 2021.
- Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech, H. Chung, S.-H. Lee, and S.-W. Lee, Interspeech, 2021.
- Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder, H.-W. Yoon, S.-H. Lee, H.-R. Noh, and S.-W. Lee, Interspeech, 2020.
- Learning Machines Can Curl - Adaptive deep reinforcement learning enables the robot Curly to win against human players in an icy world, D.-O. Won, S.-H. Lee, K.-R. Muller, and S.-W. Lee, NeurIPS 2019 Demonstration Track, 2019. [Video] [Poster]
- "METHOD AND SYSTEM FOR SYNTHESIZING SPEECH," 10-2663162, 29, Apr., 2024.
- "METHOD TO TRANSFORM VOICE," 10-2439022, 29, Aug., 2022.
- "METHOD AND APPARTUS FOR VOICE CONVERSION BY USING NEURAL NETWORK," 10-2340486, 14, Dec., 2021.
- "SYSTEM AND METHOD FOR CURLING SWEEPING CONTROL," 10-2257358, 21, May, 2021.
- "APPARATUS AND METHOD FOR RECOMMENDATION OF CURLING GAME STRATEGY USING DEEP LEARNING," 10-2045567, 11, Nov., 2019.
- "APPARATUS AND METHOD FOR DELIVERY AND SWEEPING AT CURLING GAME," 10-1948713, 11, Feb., 2019.
2016.03-2023.02: Integrated M.S.&Ph.D, Dept. of Brain and Cognitive Engineering, Korea University
2012.03-2016.02: B.S, Dept. of Life Science, Dongguk University
Reviewer: NeurIPS, ICLR, ICML, AAAI, ICASSP, IEEE/ACM Transactions on Audio, Speech, and, Language Processing
2022.02.25: Paper Award (Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis), Korea University
2024.06.25: Fake Audio Detection, Ajou University.
2024.06.07: Speech Synthesis, ์ 2ํAI์ตํฉ์ํฌ์, Ajou University.
2024.05.24: Speech Language Model for Generative AI, KSCS2024
2023.08.18: Towards Unified Speech Synthesis for Text-to-Speech and Voice Conversion, Deepbrain AI
2023.08.11: Towards Unified Speech Synthesis for Text-to-Speech and Voice Conversion, Workshop on Brain and Artificial Intelligence 2023
2023.06.20: HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis, Top Conference Session in KCC2023
2022.08.19: VoiceMixer: Adversarial Voice Style Mixup, AIGS Symposium 2022
2022.07.01: VoiceMixer: Adversarial Voice Style Mixup, Top Conference Session in KCC2022
2021.12.02: Voice Conversion, Netmarble
2021.07.29: Speech Synthesis and Voice Conversion, Neosapience