Discrete and Continuous Models and Applied Computational Science

2658-46702658-7149

Peoples' Friendship University of Russia named after Patrice Lumumba (RUDN University)

24216

10.22363/2658-4670-2020-28-2-105-119

Computer Science

Информатика и вычислительная техника

Research Article

Comparative analysis of machine learning methods by the example of the problem of determining muon decay

Сравнительный анализ методов машинного обучения на примере задачи определения мюонного распада

Gevorkyan

Migran N.

Геворкян

М. Н.

Candidate of Sciences in Physics and Mathematics, Assistant Professor of Department of Applied Probability and Informatics

Кафедра прикладной информатики и теории вероятностей

gevorkyan-mn@rudn.ru

Demidova

Anastasia V.

Демидова

А. В.

Candidate of Sciences in Physics and Mathematics, Assistant Professor of Department of Applied Probability and Informatics

Кафедра прикладной информатики и теории вероятностей

demidova-av@rudn.ru

Kulyabov

Dmitry S.

Кулябов

Д. С.

Docent, Doctor of Sciences in Physics and Mathematics, Professor at the Department of Applied Probability and Informatics

Кафедра прикладной информатики и теории вероятностей; Лаборатория информационных технологий

kulyabov-ds@rudn.ru

Peoples’ Friendship University of Russia (RUDN University)Российский университет дружбы народов

Joint Institute for Nuclear ResearchОбъединённый институт ядерных исследований

15122020

282

VOL 28, NO2 (2020)

ТОМ 28, №2 (2020)

10511920072020

2020

Gevorkyan M.N., Demidova A.V., Kulyabov D.S.

Геворкян М.Н., Демидова А.В., Кулябов Д.С.

http://creativecommons.org/licenses/by/4.0

https://journals.rudn.ru/miph/article/view/24216

The history of using machine learning algorithms to analyze statistical models is quite long. The development of computer technology has given these algorithms a new breath. Nowadays deep learning is mainstream and most popular area in machine learning. However, the authors believe that many researchers are trying to use deep learning methods beyond their applicability. This happens because of the widespread availability of software systems that implement deep learning algorithms, and the apparent simplicity of research. All this motivate the authors to compare deep learning algorithms and classical machine learning algorithms. The Large Hadron Collider experiment is chosen for this task, because the authors are familiar with this scientific field, and also because the experiment data is open source. The article compares various machine learning algorithms in relation to the problem of recognizing the decay reaction τ^– →μ^– + μ^– + μ⁺ at the Large Hadron Collider. The authors use open source implementations of machine learning algorithms. We compare algorithms with each other based on calculated metrics. As a result of the research, we can conclude that all the considered machine learning methods are quite comparable with each other (taking into account the selected metrics), while different methods have different areas of applicability.

Применение алгоритмов машинного обучения для анализа статистических моделей имеет достаточно длинную историю. Развитие компьютерной техники дало этим алгоритмам новое дыхание. Особенно громкую известность получило такое направление машинного обучения, как глубинное обучение. Однако авторы полагают, что многие исследователи пытаются использовать методы глубинного обучения за пределами их применимости. Этому способствуют как широкая распространённость программных комплексов, реализующих алгоритмы глубинного обучения, так и кажущаяся простота исследования. Всё это стало побудительным мотивом для проведения сравнения алгоритмов глубинного обучения и классических алгоритмов машинного обучения. В качестве задачи был выбран эксперимент на Большом адронном коллайдере, поскольку авторы знакомы с данной научной областью, а также потому, что данные эксперимента доступны публично. В статье проводится сравнение различных алгоритмов машинного обучения применительно к задаче распознания реакции распада τ^– →μ^– + μ^– + μ⁺ на Большом адронном коллайдере. Используются готовые свободные реализации алгоритмов машинного обучения. Алгоритмы сравниваются друг с другом на основе вычисляемых метрик. В результате исследования можно сделать вывод, что все рассмотренные методы машинного обучения вполне сопоставимы друг с другом (с учётом выбранных метрик), при этом разные методы имеют разные области применимости.

muon decaymachine learningneural networks

мюонный распадмашинное обучениенейронные сети

M. N. Gevorkyan, A. V. Demidova, T. S. Demidova, and A. A. Sobolev, “Review and comparative analysis of machine learning libraries for machine learning,” Discrete and Continuous Models and Applied Computational Science, vol. 27, no. 4, pp. 305-315, Dec. 2019. DOI: 10.22363/ 2658-4670-2019-27-4-305-315.

L. A. Sevastianov, A. L. Sevastianov, E. A. Ayrjan, A. V. Korolkova, D. S. Kulyabov, and I. Pokorny, “Structural Approach to the Deep Learning Method,” in Proceedings of the 27th Symposium on Nuclear Electronics and Computing (NEC-2019), V. Korenkov, T. Strizh, A. Nechaevskiy, and T. Zaikina, Eds., ser. CEUR Workshop Proceedings, vol. 2507, Budva, Sep. 2019, pp. 272-275.

P. Langacker, The standard model and beyond, ser. Series in High Energy Physics, Cosmology and Gravitation. CRC Press, 2009.

I. Lakatos, “Falsification and the Methodology of Scientific Research Programmes,” in Criticism and the growth of Knowledge, I. Lakatos and A. Musgrave, Eds., Cambr. University Press, 1970, pp. 91-195.

R. Aaij et al., “Search for the lepton flavour violating decay τ– → μ– + μ+ + μ −,” Journal of High Energy Physics, vol. 2015, no. 2, p. 121, Feb. 2015. DOI: 10.1007/JHEP02(2015)121. arXiv: 1409.8548.

(2018). “Flavours of Physics: Finding τ→ μμμ (Kernels Only),” [Online]. Available: https://www.kaggle.com/c/flavours-of-physicskernels-only.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.

F. Chollet. (2020). “Keras,” [Online]. Available: https://keras.io/.

(2020). “XGBoost Documentation,” [Online]. Available: https:// xgboost.readthedocs.io.

10.

(2020). “Hep_ml,” [Online]. Available: https://arogozhnikov.github. io.

11.

(2020). “CNTC official repository,” [Online]. Available: https://github. com/Microsoft/cntk.

12.

Theano Development Team, “Theano: A Python framework for fast computation of mathematical expressions,” arXiv e-prints, vol. abs/1605.0, 2016.

13.

I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques, ser. The Morgan Kaufmann Series in Data Management Systems. Elsevier, 2011. DOI: 10.1016/ C2009-0-19715-5.

14.

A. Bruce and P. Bruce, Practical Statistics for Data Scientists: 50 Essential Concepts. O’Reilly Media, 2017.

15.

J. VanderPlas, Python Data Science Handbook: Essential Tools for Working with Data. O’Reilly Media, 2016.

16.

(2020). “Scikit-learn home site,” [Online]. Available: https://scikitlearn.org/stable/.

17.

D. W. Hosmer, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression, ser. Wiley Series in Probability and Statistics. Wiley, 2013.

18.

J. M. Hilbe, Logistic Regression Models, ser. Chapman & Hall/CRC Texts in Statistical Science. Chapman and Hall/CRC, May 2009. DOI: 10.1201/9781420075779.

19.

D. Ruppert, “The Elements of Statistical Learning: Data Mining, Inference, and Prediction,” Journal of the American Statistical Association, Springer Series in Statistics, vol. 99, no. 466, p. 567, 2004. DOI: 10. 1198/jasa.2004.s339.

20.

R. Collins, Machine Learning with Bagging and Boosting. Amazon Digital Services LLC - Kdp Print Us, 2018.

21.

J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001. DOI: 10.2307/2699986.

22.

A. W. Kemp and B. F. J. Manly, Randomization, Bootstrap and Monte Carlo Methods in Biology. Ser. Chapman & Hall/CRC Texts in Statistical Science 4. CRC Press, Dec. 1997, vol. 53. DOI: 10.2307/2533527.

23.

O. Soranson, Python Data Science Handbook: The Ultimate Guide to Learn How to Use Python for Data Analysis and Data Science. Learn the Essential Tools for Beginners to Work with Data, ser. Artificial Intelligence Series. Amazon Digital Services LLC - KDP Print US, 2019.

24.

M. Abadi, A. Agarwal, Paul Barham, EugeneBrevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, and Jeffrey Dean. (2015). “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems,” [Online]. Available: http://tensorflow.org/.

25.

(2020). “TensorFlow home site,” [Online]. Available: https://www. tensorflow.org/.

26.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017.