Pre-training Long Short-term Memory Neural Networks for Efficient Regression in Artificial Speech Postfiltering

Colaboradores:: Ing. Marvin Coto Jiménez, PhD.
Autores:: Marvin Coto-Jiménez
Revista:: N/A
Editor:: N/A
URL:: https://ieeexplore.ieee.org/abstract/document/8464204/

Resumen

Several attempts to enhance statistical parametric speech synthesis have contemplated deep-learning-based postfil-ters, which learn to perform a mapping of the synthetic speech parameters to the natural ones, reducing the gap between them. In this paper, we introduce a new pre-training approach for neural networks, applied in LSTM-based postfilters for speech synthesis, with the objective of enhancing the quality of the synthesized speech in a more efficient manner. Our approach begins with an auto-regressive training of one LSTM network, whose is used as an initialization for postfilters based on a denoising autoencoder architecture. We show the advantages of this initialization on a set of multi-stream postfilters, which encompass a collection of denoising autoencoders for the set of MFCC and fundamental frequency parameters of the artificial voice. Results show that the initialization succeeds in lowering …