Modeling Speach Features Via Simulated Annealing Algorithm

Cover Page

Abstract


Mel-Frequency Cepstral Coefficients are in so far the most popular speech features. However, depending on the length of a vocal tract (it is worth mentioning that length of a vocal tract is dependent on sex and other physiologic parameters of a speaker, such as height, and can vary from 13 cm to 18 cm) frequencies of central formants are shifted. The value of the shift can be as large as 25%. This huge difference can lead to a wrong recognition of a new utterance by a previously well-trained model when the utterance was said by a new speaker, thus the system becomes speaker-dependent. Alternative way is to use speaker independent features such as that obtained using Auditory Image Model (AIM) to describe input utterance. In our work we propose AIM based features which are calculated using simulated annealing algorithm. Using Monte-Carlo schemes we investigate statistical properties of maximum likelihood estimates of Gram-Charlier extension of normal density obtained via simulated annealing algorithm, also we compare different methods to solve aforementioned optimization problem.

A V Ermilov

National Research University “Higher School of Economics”

Email: alvalerm@mail.ru
Department of Control of System Development

Views

Abstract - 85

PDF (Russian) - 55


Copyright (c) 2014 Ермилов А.В.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.