Discrete and Continuous Models and Applied Computational Science

2658-46702658-7149

Peoples' Friendship University of Russia named after Patrice Lumumba (RUDN University)

8391

Articles

Статьи

Research Article

Modeling Speach Features Via Simulated Annealing Algorithm

Моделирование речевых признаков с помощью алгоритма симуляции отжига

Ermilov

A V

Ермилов

Алексей Валерьевич

Department of Control of System DevelopmentКафедра управления разработкой программного обеспеченияalvalerm@mail.ru

National Research University “Higher School of Economics”Национальный исследовательский университет «Высшая школа экономики»

15022014

NO2 (2014)

№2 (2014)

35435808092016

2014

Ermilov A.V.

Ермилов А.В.

https://creativecommons.org/licenses/by-nc/4.0

https://journals.rudn.ru/miph/article/view/8391

Mel-Frequency Cepstral Coefficients are in so far the most popular speech features. However, depending on the length of a vocal tract (it is worth mentioning that length of a vocal tract is dependent on sex and other physiologic parameters of a speaker, such as height, and can vary from 13 cm to 18 cm) frequencies of central formants are shifted. The value of the shift can be as large as 25%. This huge difference can lead to a wrong recognition of a new utterance by a previously well-trained model when the utterance was said by a new speaker, thus the system becomes speaker-dependent. Alternative way is to use speaker independent features such as that obtained using Auditory Image Model (AIM) to describe input utterance. In our work we propose AIM based features which are calculated using simulated annealing algorithm. Using Monte-Carlo schemes we investigate statistical properties of maximum likelihood estimates of Gram-Charlier extension of normal density obtained via simulated annealing algorithm, also we compare different methods to solve aforementioned optimization problem.

Мел-частотные кепстральные коэффициенты до сих пор являются наиболее популярными речевыми признаками. Однако в зависимости от длины речевого тракта (стоит отметить, что длина речевого тракта зависит от пола и других физиологических параметров, таких как рост, и может меняться в пределах от 13 до 18 см) частоты центральных формант оказываются смещёнными. Величина смещения может достигать 25%. Такие большие различия могут вести к неправильному распознаванию высказывания предварительно хорошо обученной модели в случае, если высказывание было произнесено новым диктором, то есть система становится дикторозависимой. Альтернативой является применение признаков, которые не зависят от диктора, например, полученные с помощью аудиовизуальных моделей (Auditory Image Model). В данной статье описываются признаки, основанные на аудиовизуальных моделях, которые могут быть вычислены при помощи алгоритма симуляции отжига. На основе Монте-Карло-симуляций исследованы статистические свойства оценок параметров расширения Грам-Шарлье нормального распределения, полученных применением метода симуляции отжига к решению задачи максимизации правдоподобия, а также проведено сравнение точности решения данной задачи максимизации правдоподобия при помощи различных методов.

speach featuressimulated annealingspeech recognitiondistribution modelingnumerical methods

речевые признакиалгоритм симуляции отжигараспознавание речимоделирование распределенийчисленные методы

Sahidullah M., Saha G. Design, Analysis and Experimental Evaluation of Block Based Transformation in MFCC Computation for Speaker Recognition // Speech Communication. - 2012. - Vol. 54, No 4. - Pp. 543-565.

Munich M. E., Lin Q. Auditory Image Model features for Automatic Speech Recognition // 9th European Conference on Speech Communication and Technology (Interspeech’ 2005 - Eurospeech). - 2005. - Pp. 3037-3040.

Niguez T., Perote J. Forecasting the Density of Asset Returns // STICERD Working Paper. - 2004.

Neal R. M. Slice Sampling // Annals of Statistics. - 2003. - Vol. 31, No 3. - Pp. 705-767.

Convergence Properties of the Nelder-MeadSimplex Method in Low Dimensions / J.C. Lagarias, J.A. Reeds, M.H. Wright, P.E. Wright // SIAM Journal on Optimization. - 1998. - Vol. 9, No 1. - Pp. 112-147.