LSTM Deep Neural Networks Postfiltering for Enhancing Synthetic Voices

Colaboradores:: Ing. Marvin Coto Jiménez, PhD.
Autores:: Marvin Coto-Jiménez and John Goddard-Close
Revista:: International Journal of Pattern Recognition and Artificial Intelligence
Editor:: World Scientific Publishing Company
URL:: https://www.worldscientific.com/doi/abs/10.1142/S021800141860008X

Resumen

Recent developments in speech synthesis have produced systems capable of producing speech which closely resembles natural speech, and researchers now strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. Speech synthesis based on Hidden Markov Models (HMM) is of great interest to researchers, due to its ability to produce sophisticated features with a small footprint. Despite some progress, its quality has not yet reached the level of the current predominant unit-selection approaches, which select and concatenate recordings of real speech, and work has been conducted to try to improve HMM-based systems. In this paper, we present an application of long short-term memory (LSTM) deep neural networks as a postfiltering step in HMM-based speech synthesis. Our motivation stems from a …