<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root>
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en"><front><journal-meta><journal-id journal-id-type="publisher-id">Discrete and Continuous Models and Applied Computational Science</journal-id><journal-title-group><journal-title xml:lang="en">Discrete and Continuous Models and Applied Computational Science</journal-title><trans-title-group xml:lang="ru"><trans-title>Discrete and Continuous Models and Applied Computational Science</trans-title></trans-title-group></journal-title-group><issn publication-format="print">2658-4670</issn><issn publication-format="electronic">2658-7149</issn><publisher><publisher-name xml:lang="en">Peoples' Friendship University of Russia named after Patrice Lumumba (RUDN University)</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">45255</article-id><article-id pub-id-type="doi">10.22363/2658-4670-2025-33-2-172-183</article-id><article-id pub-id-type="edn">MGCVKV</article-id><article-categories><subj-group subj-group-type="toc-heading" xml:lang="en"><subject>Computer Science</subject></subj-group><subj-group subj-group-type="toc-heading" xml:lang="ru"><subject>Информатика и вычислительная техника</subject></subj-group><subj-group subj-group-type="article-type"><subject>Research Article</subject></subj-group></article-categories><title-group><article-title xml:lang="en">Predictive diagnostics of computer systems logs using natural language processing techniques</article-title><trans-title-group xml:lang="ru"><trans-title>Предиктивная диагностика логов компьютерных систем с помощью методов обработки естественного языка</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><contrib-id contrib-id-type="orcid">https://orcid.org/0009-0002-9692-0225</contrib-id><contrib-id contrib-id-type="scopus">57220041155</contrib-id><name-alternatives><name xml:lang="en"><surname>Kiriachek</surname><given-names>Vladislav A.</given-names></name><name xml:lang="ru"><surname>Кирячёк</surname><given-names>В. А.</given-names></name></name-alternatives><bio xml:lang="en"><p>PhD student of Department of Computational Mathematics and Artificial Intelligence</p></bio><email>w.a.kiryachok@mail.ru</email><xref ref-type="aff" rid="aff1"/></contrib><contrib contrib-type="author"><contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-5321-9650</contrib-id><contrib-id contrib-id-type="scopus">57201380251</contrib-id><name-alternatives><name xml:lang="en"><surname>Salpagarov</surname><given-names>Soltan I.</given-names></name><name xml:lang="ru"><surname>Салпагаров</surname><given-names>С. И.</given-names></name></name-alternatives><bio xml:lang="en"><p>Candidate of Physical and Mathematical Sciences, associate Professor of Department of Computational Mathematics and Artificial Intelligence</p></bio><email>salpagarov-si@rudn.ru</email><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff-alternatives id="aff1"><aff><institution xml:lang="en">RUDN University</institution></aff><aff><institution xml:lang="ru">Российский университет дружбы народов</institution></aff></aff-alternatives><pub-date date-type="pub" iso-8601-date="2025-07-15" publication-format="electronic"><day>15</day><month>07</month><year>2025</year></pub-date><volume>33</volume><issue>2</issue><issue-title xml:lang="en">VOL 33, NO2 (2025)</issue-title><issue-title xml:lang="ru">ТОМ 33, №2 (2025)</issue-title><fpage>172</fpage><lpage>183</lpage><history><date date-type="received" iso-8601-date="2025-07-25"><day>25</day><month>07</month><year>2025</year></date></history><permissions><copyright-statement xml:lang="en">Copyright ©; 2025, Kiriachek V.A., Salpagarov S.I.</copyright-statement><copyright-statement xml:lang="ru">Copyright ©; 2025, Кирячёк В.А., Салпагаров С.И.</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="en">Kiriachek V.A., Salpagarov S.I.</copyright-holder><copyright-holder xml:lang="ru">Кирячёк В.А., Салпагаров С.И.</copyright-holder><ali:free_to_read xmlns:ali="http://www.niso.org/schemas/ali/1.0/"/><license><ali:license_ref xmlns:ali="http://www.niso.org/schemas/ali/1.0/">https://creativecommons.org/licenses/by-nc/4.0</ali:license_ref></license></permissions><self-uri xlink:href="https://journals.rudn.ru/miph/article/view/45255">https://journals.rudn.ru/miph/article/view/45255</self-uri><abstract xml:lang="en"><p>This study aims to develop and validate a method for predictive diagnostics and anomaly detection in computer system logs, using the Vertica database as a case study. The proposed approach is based on semisupervised learning combined with natural language processing techniques. A specialized parser utilizing a semantic graph was developed for data preprocessing. Vectorization was performed using the fastText NLP library and TF-IDF weighting. Empirical validation was conducted on real Vertica log files from a large IT company, containing periods of normal operation and anomalies leading to failures. A comparative assessment of various anomaly detection algorithms was performed, including k-nearest neighbors, autoencoders, One Class SVM, Isolation Forest, Local Outlier Factor, and Elliptic Envelope. Results are visualized through anomaly graphs depicting time intervals exceeding the threshold level. The findings demonstrate high efficacy of the proposed approach in identifying anomalies preceding system failures and delineate promising directions for further research.</p></abstract><trans-abstract xml:lang="ru"><p>Данное исследование направлено на разработку и валидацию метода предиктивной диагностики и детекции аномалий в логах компьютерных систем, используя в качестве примера базу данных Vertica. Предложенный подход основан на обучении с частичным привлечением учителя в сочетании с методами обработки естественного языка. Для предварительной обработки данных разработан специализированный парсер, использующий семантический граф. Векторизация осуществлялась с применением NLP-библиотеки fastText и взвешивания TF-IDF. Эмпирическая валидация проводилась на реальных лог-файлах Vertica крупной IT-компании, содержащих как периоды нормального функционирования, так и аномалии, приведшие к сбоям. Проведена сравнительная оценка эффективности различных алгоритмов обнаружения аномалий, включая метод k-ближайших соседей, автоэнкодеры, One Class SVM, Isolation Forest, Local Outlier Factor и Elliptic Envelope. Результаты визуализированы посредством графиков аномальности, отражающих временные интервалы с превышением порогового уровня. Полученные результаты демонстрируют высокую эффективность предложенного подхода в идентификации предшествующих сбоям аномалий и определяют перспективные направления дальнейших исследований.</p></trans-abstract><kwd-group xml:lang="en"><kwd>machine learning</kwd><kwd>natural language processing</kwd><kwd>log analysis</kwd><kwd>anomaly detection</kwd><kwd>predictive diagnostics</kwd></kwd-group><kwd-group xml:lang="ru"><kwd>машинное обучение</kwd><kwd>методы обработки естественного языка</kwd><kwd>анализ логов</kwd><kwd>детекция аномалий</kwd><kwd>предиктивная диагностика</kwd></kwd-group><funding-group/></article-meta></front><body></body><back><ref-list><ref id="B1"><label>1.</label><mixed-citation>He, P., Zhu, J., Zheng, Z. &amp; Lyu, M. R. Drain: An online log parsing approach with fixed depth tree. IEEE International Conference on Web Services (ICWS), 33-40. doi:10.1109/ICWS.2017.13 (2017).</mixed-citation></ref><ref id="B2"><label>2.</label><mixed-citation>Du, M. &amp; Li, F. Spell: Streaming parsing of system event logs. 2016 IEEE 16th International Conference on Data Mining (ICDM), 859-864. doi:10.1109/ICDM.2016.0103 (2016).</mixed-citation></ref><ref id="B3"><label>3.</label><mixed-citation>Bojanowski, P., Grave, E., Joulin, A. &amp; Mikolov, T. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5, 135-146. doi:10. 1162/tacl_a_00051 (2017).</mixed-citation></ref><ref id="B4"><label>4.</label><mixed-citation>Zhang, X. et al. Robust log-based anomaly detection on unstable log data. ESEC/FSE, 807-817. doi:10.1145/3338906.3338931 (2019).</mixed-citation></ref><ref id="B5"><label>5.</label><mixed-citation>Lu, S., Wei, X., Li, Y. &amp; Wang, L. Detecting anomaly in big data system logs using convolutional neural network. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), 151 doi:10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00037 (2018).</mixed-citation></ref><ref id="B6"><label>6.</label><mixed-citation>Du, M., Li, F., Zheng, G. &amp; Srikumar, V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1285-1298. doi:10.1145/3133956.3134015 (2017).</mixed-citation></ref><ref id="B7"><label>7.</label><mixed-citation>Meng, W. et al. LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In IJCAI 7, 4739-4745. doi:10.24963/ijcai.2019/658 (2019).</mixed-citation></ref><ref id="B8"><label>8.</label><mixed-citation>Guo, H., Yuan, S. &amp; Wu, X. LogBERT: Log Anomaly Detection via BERT. In 2021 international joint conference on neural networks, 1-8. doi:10.48550/arXiv.2103.04475 (Mar. 2021).</mixed-citation></ref><ref id="B9"><label>9.</label><mixed-citation>Yang, L., Chen, J., Wang, Z., Wang, W., Jiang, J., Dong, X. &amp; Zhang, W. Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation. 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 1448-1460. doi:10.1109/ICSE43902.2021.00130 (2021).</mixed-citation></ref><ref id="B10"><label>10.</label><mixed-citation>Nedelkoski, S., Bogatinovski, J., Acke, A., Cardoso, J. &amp; Kao, O. Self-attentive classification-based anomaly detection in unstructured logs. In 2020 IEEE international conference on data mining, 1196-1201. doi:10.1109/ICDM50108.2020.00148 (2020).</mixed-citation></ref><ref id="B11"><label>11.</label><mixed-citation>Farzad, A. &amp; Gulliver, T. A. Unsupervised log message anomaly detection. ICT Express, 229-237. doi:10.1016/j.icte.2020.06.003 (2020).</mixed-citation></ref><ref id="B12"><label>12.</label><mixed-citation>Wang, Q., Zhang, X., Wang, X. &amp; Cao, Z. Log Sequence Anomaly Detection Method Based on Contrastive Adversarial Training and Dual Feature Extraction. Entropy 24, 69. doi:10.3390/e24010069 (Dec. 2021).</mixed-citation></ref><ref id="B13"><label>13.</label><mixed-citation>Wan, Y., Liu, Y., Wang, D. &amp; Wen, Y. GLAD-PAW: Graph-Based Log Anomaly Detection by Position Aware Weighted Graph Attention Network in (May 2021). doi:10.1007/978-3-030-75762-5_6.</mixed-citation></ref><ref id="B14"><label>14.</label><mixed-citation>Catillo, M., Pecchia, A. &amp; Villano, U. AutoLog: Anomaly detection by deep autoencoding of system logs. Expert Systems with Applications 191. doi:10.1016/j.eswa.2021.116263 (2022).</mixed-citation></ref><ref id="B15"><label>15.</label><mixed-citation>Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J. &amp; Williamson, R. C. Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443-1471. doi:10.1162/089976601750264965 (2001).</mixed-citation></ref><ref id="B16"><label>16.</label><mixed-citation>Liu, F. T., Ting, K. M. &amp; Zhou, Z. H. Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 413-422. doi:10.1109/ICDM.2008.17 (2008).</mixed-citation></ref><ref id="B17"><label>17.</label><mixed-citation>Breunig, M., Kröger, P., Ng, R. &amp; Sander, J. LOF: Identifying Density-Based Local Outliers. ACM Sigmod Record 29, 93-104. doi:10.1145/342009.335388 (June 2000).</mixed-citation></ref><ref id="B18"><label>18.</label><mixed-citation>Rousseeuw, P. J. &amp; Van Driessen, K. A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212. doi:10.1080/00401706.1999.10485670 (1999).</mixed-citation></ref><ref id="B19"><label>19.</label><mixed-citation>Mikolov, T., Chen, K., Corrado, G. &amp; Dean, J. Efficient estimation of word representations in vector space. doi:10.48550/arXiv.1301.3781 (2013).</mixed-citation></ref><ref id="B20"><label>20.</label><mixed-citation>Pennington, J., Socher, R. &amp; Manning, C. D. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing, 1532-1543. doi:10.3115/v1/D14-1162 (2014).</mixed-citation></ref></ref-list></back></article>
