Modeling Speach Features Via Simulated Annealing Algorithm
- Authors: Ermilov AV1
-
Affiliations:
- National Research University “Higher School of Economics”
- Issue: No 2 (2014)
- Pages: 354-358
- Section: Articles
- URL: https://journals.rudn.ru/miph/article/view/8391
Cite item
Full Text
Abstract
Mel-Frequency Cepstral Coefficients are in so far the most popular speech features. However, depending on the length of a vocal tract (it is worth mentioning that length of a vocal tract is dependent on sex and other physiologic parameters of a speaker, such as height, and can vary from 13 cm to 18 cm) frequencies of central formants are shifted. The value of the shift can be as large as 25%. This huge difference can lead to a wrong recognition of a new utterance by a previously well-trained model when the utterance was said by a new speaker, thus the system becomes speaker-dependent. Alternative way is to use speaker independent features such as that obtained using Auditory Image Model (AIM) to describe input utterance. In our work we propose AIM based features which are calculated using simulated annealing algorithm. Using Monte-Carlo schemes we investigate statistical properties of maximum likelihood estimates of Gram-Charlier extension of normal density obtained via simulated annealing algorithm, also we compare different methods to solve aforementioned optimization problem.
About the authors
A V Ermilov
National Research University “Higher School of Economics”
Email: alvalerm@mail.ru
Department of Control of System Development