Universidad de Costa Rica

Acoustic vowel analysis of a Mexican Spanish HMM-based speech synthesis


Colaboradores:
Ing. Marvin Coto Jiménez, PhD.
Autores:
John Goddard-Close Marvin Coto-Jiménez and Fabiola Martínez-Licona
Revista:
Research in Computing Science
Editor:
N/A
URL:
https://pdfs.semanticscholar.org/da99/9899506ba77e13e1daa299c751941e4c3e2e.pdf

Resumen:

The synthetic voice produced from an HMM-based system is often reported as sounding muffled when it is compared to natural speech. There are several reasons for this effect: some precise and fine characteristics of the natural speech are removed, minimized or hidden in the modeling phase of the HMM system; the resulting speech-parameter trajectories become oversmoothed versions of the speech waveforms. In order to obtain more natural synthetic voices, different training conditions must be tried in the construction of the HMMs. One of the most important issues related to the obtained synthetic voice is that of quality assessment. There are several ways to address this, from subjective to objective approaches, applied to different parameters. This paper presents a comparative analysis of certain acoustic features derived from synthesized speech which has been obtained using different training configurations. Pitch, jitter and shimmer were extracted from the synthesized versions of three training sets of vowels of a Mexican Spanish speech database: the normal training set and sets with alterations in the context and fundamental frequency F0. The results show that these objective features can be part of an adequate quality assessment of synthetic speech.

© 2020 Escuela de Ingeniería Eléctrica, Universidad de Costa Rica.