Implementation of MAS from Glow-TTS for easy reuse in other projects.
pip install monotonic-alignment-search
Wheels are provided for Linux, Mac, and Windows.
MAS can find the most probable alignment between a text sequence t_x
and a
speech sequence t_y
.
from monotonic_alignment_search import maximum_path
# value (torch.Tensor): [batch_size, t_x, t_y]
# mask (torch.Tensor): [batch_size, t_x, t_y]
path = maximum_path(value, mask, implementation="cython")
The implementation
argument allows choosing from one of the following
implementations:
cython
(default): Cython-optimisednumpy
: pure Numpy
This implementation is taken from the original Glow-TTS repository. Consider citing the Glow-TTS paper when using this project:
@inproceedings{kim2020_glowtts,
title={Glow-{TTS}: A Generative Flow for Text-to-Speech via Monotonic Alignment Search},
author={Jaehyeon Kim and Sungwon Kim and Jungil Kong and Sungroh Yoon},
booktitle={Proceedings of Neur{IPS}},
year={2020},
}