Universidad de Costa Rica

Pre-training Long Short-term Memory Neural Networks for Efficient Regression in Artificial Speech Postfiltering


Colaboradores:
Ing. Marvin Coto Jiménez, PhD.
Autores:
Marvin Coto-Jiménez
Revista:
N/A
Editor:
N/A
URL:
https://ieeexplore.ieee.org/abstract/document/8464204/

Resumen:

Several attempts to enhance statistical parametric speech synthesis have contemplated deep-learning-based postfil-ters, which learn to perform a mapping of the synthetic speech parameters to the natural ones, reducing the gap between them. In this paper, we introduce a new pre-training approach for neural networks, applied in LSTM-based postfilters for speech synthesis, with the objective of enhancing the quality of the synthesized speech in a more efficient manner. Our approach begins with an auto-regressive training of one LSTM network, whose is used as an initialization for postfilters based on a denoising autoencoder architecture. We show the advantages of this initialization on a set of multi-stream postfilters, which encompass a collection of denoising autoencoders for the set of MFCC and fundamental frequency parameters of the artificial voice. Results show that the initialization succeeds in lowering …

© 2020 Escuela de Ingeniería Eléctrica, Universidad de Costa Rica.