index.html

<html lang="en-US">
  <head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">

<!-- Begin Jekyll SEO tag v2.7.1 -->
<title>Neural Text to Speech Synthesis | Tutorial @ IJCAI 2021</title>
<meta name="generator" content="Jekyll v3.9.0" />
<meta property="og:title" content="Neural Text to Speech Synthesis" />
<meta property="og:locale" content="en_US" />
<link rel="canonical" href="https://tts-tutorial.github.io/ijcai2021/" />
<meta property="og:url" content="https://tts-tutorial.github.io/ijcai2021/" />
<meta property="og:site_name" content="ijcai2021" />
<meta name="twitter:card" content="summary" />
<meta property="twitter:title" content="Neural Text to Speech Synthesis" />
<script type="application/ld+json">
{"headline":"Neural Text to Speech Synthesis","url":"https://tts-tutorial.github.io/ijcai2021/","@type":"WebSite","name":"ijcai2021","@context":"https://schema.org"}</script>
<!-- End Jekyll SEO tag -->

  </head>
  <body>
    <div class="container-lg px-3 my-5 markdown-body">
      
      <h1 id="neural-text-to-speech-synthesis">Neural Text to Speech Synthesis</h1>
<p>Tutorial @ <a href="http://ijcai-21.org">IJCAI 2021</a>, August 19-26, 2021</p>

<h2 id="speakers">Speakers</h2>
<p><a href="https://www.microsoft.com/en-us/research/people/xuta/">Xu Tan</a>, Microsoft Research Asia, <a href="mailto:xuta@microsoft.com">xuta@microsoft.com</a> <br />
<a href="https://www.microsoft.com/en-us/research/people/taoqin/">Tao Qin</a>, Microsoft Research Asia, <a href="mailto:taoqin@microsoft.com">taoqin@microsoft.com</a></p>

<h2 id="abstract">Abstract</h2>
<p>Text to speech (TTS), which aims to synthesize natural and intelligible speech given text, has been a hot research topic in the artificial intelligence community and has become an important product service in the industry. As the development of deep learning and artificial intelligence, neural network based TTS has significantly improved the quality of synthesized speech in recent years. In this tutorial, we will give an introduction to neural text to speech, which consists of four parts. In the first part, we will briefly overview the history of TTS technology. In the second part, we will introduce the key components in neural TTS, including text analysis, acoustic model and vocoder.  In the third part, we will review the works that push the frontier of TTS research and cover practical TTS products, including end-to-end TTS, non-autoregressive and lightweight TTS, robust/expressive/controllable TTS, low-resource TTS, and custom voice adaptation. At the end of the tutorial, we will describe several challenges of TTS and discuss future research directions.</p>

<h2 id="outline">Outline</h2>

<ol>
  <li>Background <br /></li>
  <li>Key components in TTS<br />
  2.1 Text analysis<br />
  2.2 Acoustic model<br />
  2.3 Vocoder<br />
  2.4 Towards end-to-end TTS<br /></li>
  <li>Advanced topics in TTS <br />
  3.1 Fast TTS<br />
  3.2 Low-resource TTS<br />
  3.3 Robust TTS<br />
  3.4 Expressive TTS<br />
  3.5 Adaptive TTS<br /></li>
  <li>Challenges and future directions<br /></li>
</ol>

<h2 id="materials">Materials</h2>
<p><a href="https://www.microsoft.com/en-us/research/uploads/prod/2023/04/TTS.ijcai21-642be55185047.pdf">Slides</a><br />
<a href="https://www.microsoft.com/en-us/research/project/text-to-speech/">Project page</a><br />
<a href="https://speechresearch.github.io/">Speech demo page</a></p>

<h2 id="other-related-links">Other Related Links</h2>
<p>
<a href="https://www.microsoft.com/en-us/research/uploads/prod/2021/07/ISCSLP2021-TTS-Tutorial-Xu-Tan.pdf">TTS tutorial</a> @ <a href="https://www.iscslp2021.org/program/tutorials/">ISCSLP 2021</a><br /> 
<a href="https://mp.weixin.qq.com/s/qEhsoWwi2MEL5Ude5QvBag">A talk on low-resource TTS</a> @ Jiangmen <br />
<a href="https://resource.gtcevent.cn/gtc2020/pdf/CNS20269.pdf">A talk on FastSpeech</a> @ NVIDIA GTC China 2020 <br />
<a href="https://www.youtube.com/watch?v=MA8PCvmr8B0">A webinar talk on TTS</a> @ Microsoft Research <br />    
<a href="https://www.microsoft.com/en-us/research/uploads/prod/2021/07/Efficient-ML-for-Speech-and-Music-Xu-Tan.pdf">A talk on Towards Efficient Machine Learning for Speech and Music Applications <br /> 
      
      </p>


    </div>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/anchor-js/4.1.0/anchor.min.js" integrity="sha256-lZaRhKri35AyJSypXXs4o6OPFTbTmUoltBbDCbdzegg=" crossorigin="anonymous"></script>
    <script>anchors.add();</script>
    
  </body>
</html>