Research Article

Modeling Speach Features Via Simulated Annealing Algorithm

Ermilov A V
Department of Control of System Development
alvalerm@mail.ru
National Research University "Higher School of Economics"

15022014

Mel-Frequency Cepstral Coefficients are in so far the most popular speech features. However, depending on the length of a vocal tract (it is worth mentioning that length of a vocal tract is dependent on sex and other physiologic parameters of a speaker, such as height, and can vary from 13 cm to 18 cm) frequencies of central formants are shifted. The value of the shift can be as large as 25%. This huge difference can lead to a wrong recognition of a new utterance by a previously well-trained model when the utterance was said by a new speaker, thus the system becomes speaker-dependent. Alternative way is to use speaker independent features such as that obtained using Auditory Image Model (AIM) to describe input utterance. In our work we propose AIM based features which are calculated using simulated annealing algorithm. Using Monte-Carlo schemes we investigate statistical properties of maximum likelihood estimates of Gram-Charlier extension of normal density obtained via simulated annealing algorithm, also we compare different methods to solve aforementioned optimization problem.

speach features
simulated annealing
speech recognition
distribution modeling
numerical methods

речевые признаки
алгоритм симуляции отжига
распознавание речи
моделирование распределений
численные методы