Statistical and density-based clustering techniques in the context of anomaly detection in network systems: A comparative analysis
- Authors: Baklashov A.S.1,2, Kulyabov D.S.1,3
-
Affiliations:
- RUDN University
- V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences
- Joint Institute for Nuclear Research
- Issue: Vol 33, No 1 (2025)
- Pages: 27-45
- Section: Computer Science
- URL: https://journals.rudn.ru/miph/article/view/44731
- DOI: https://doi.org/10.22363/2658-4670-2025-33-1-27-45
- EDN: https://elibrary.ru/AFZDUC
- ID: 44731
Cite item
Full Text
Abstract
In the modern world, the volume of data stored electronically and transmitted over networks continues to grow rapidly. This trend increases the demand for the development of effective methods to protect information transmitted over networks as network traffic. Anomaly detection plays a crucial role in ensuring net security and safeguarding data against cyberattacks. This study aims to review statistical and density-based clustering methods used for anomaly detection in network systems and to perform a comparative analysis based on a specific task. To achieve this goal, the authors analyzed existing approaches to anomaly detection using clustering methods. Various algorithms and clustering techniques applied within network environments were examined in this study. The comparative analysis highlights the high effectiveness of clustering methods in detecting anomalies in network traffic. These findings support the recommendation to integrate such methods into intrusion detection systems to enhance information security levels. The study identified common features, differences, strengths, and limitations of the different methods. The results offer practical insights for improving intrusion detection systems and strengthening data protection in network infrastructures.
Full Text
1. Introduction With the growing frequency and complexity of attacks targeting information systems [1], such as DDoS and data breaches, having a system to protect information from these types of attacks becomes a vital aspect of network design. Anomaly detection serves as a key element in ensuring the security of information systems, as anomalies in network traffic often indicate unauthorized access attempts or other forms of intrusion. That is why developing effective methods to detect deviations in network traffic behavior remains a crucial challenge. In [2], the authors provided a comprehensive overview of methods, systems, and tools for anomaly detection in network traffic. That study placed particular attention on classifying the approaches available at the time, including clustering-based methods. However, due to its publication date, the review does not fully reflect recent advances in data processing and modern algorithms. The study in [2] also overlooked key aspects of density-based clustering methods such as DBSCAN, HDBSCAN, etc. Therefore, actualization and in-depth study of clustering methods in the context of modern network traffic paradigms presents a highly relevant research direction. Recent approaches offer new opportunities to improve the accuracy and efficiency of anomaly detection. Researchers now explore a wide range of anomaly detection techniques. For example, some researchers detect it using deep unfolding methods to reconstruct normal and anomalous data flows based on sparse and full-dimensional components [3]. The others use approaches such as Isolation Forest and autoencoders to detect anomalies [4]. Many researchers focus on neural network-based techniques. In [5], the authors investigate deep learning to address the issue of false positives in anomaly detection. At the same time, the others combine traditional approaches with machine learning techniques [6]. These methods have demonstrated strong performance in recognizing different data patterns, making them particularly effective for solving cybersecurity challenges. This study proposes a clustering-based approach for network intrusion detection. The proposed method aims to serve as the first line of defense against network attacks within intrusion detection systems (IDS), which monitor events occurring within information systems or their individual components. The objective of this study is to analyze existing clustering methods for anomaly detection and to perform a comparative assessment. To achieve this, the study analyzes and evaluates several clustering algorithms and summarizes their properties in a comparison table. An experimental section follows, presenting results for each method applied to a specific dataset. The article includes several sections, each addressing a specific aspect of the research: The section “Types of intrusion detection systems and anomaly detection methodology” defines IDS, outlines main IDS types, and introduces clustering methods. The section “Methods and instruments” presents a detailed review and analysis of clustering techniques. Subsections cover partitioning methods (e.g., k-means, k-medoids), hierarchical clustering, and density-based clustering (DBSCAN, HDBSCAN, OPTICS). A summary table at the end of this section facilitates the comparison of these methods. The section “Practical application of clustering methods in network anomaly detection systems” presents the comparative analysis results of six clustering algorithms, tested on a real dataset. This section includes results and interpretation of the metrics obtained for each clustering method experimentally. The section “Results” presents the metrics obtained by application of the clustering methods to the specific dataset. Data is presented in the summary table, heatmap and text format. The section “Discussion” summarizes the experimental findings and justifies the selection of the most suitable clustering method for network anomaly detection. Baklashov, A. S., Kulyabov, D. S. Statistical and density-based clustering techniques in the context of … 29 The section “Conclusion” outlines the main outcomes and discusses directions for future research. 2. Types of intrusion detection systems and anomaly detection methodology An intrusion detection system (IDS) is software or hardware that analyzes network traffic or computer activity to identify potential unauthorized access attempts, attacks, or intrusions into computer systems or networks [7]. IDS detects a wide range of threats, including intrusions, viruses, worms, denial-of-service (DoS) attacks, and other anomalous behaviors, and alerts administrators about it, forcing them to enable timely defensive actions [8]. Researchers classify intrusion detection systems into two main types based on detection methods and the way of deployment [9, 10]: 1. Network-based intrusion detection systems (NIDS) analyze network traffic for anomalies by intercepting data at the network adapter level or via network devices such as switches and routers. NIDS can detect attacks before they reach the target system. 2. Host-based intrusion detection systems (HIDS) run on individual computers and monitor activity at the operating system level, including file system changes, registry modifications, event logs, and other system parameters. HIDS typically detect attacks targeting a specific host and may offer additional insights about system compromise. This study focuses on clustering methods as a key tool for identifying anomalies in network-based intrusion detection systems (NIDS). Dividing network traffic into clusters that represent normal and abnormal behavior plays a critical role in designing effective NIDS and ranks among the most successful techniques for detecting network anomalies. This clustering approach enhances both the accuracy and efficiency of IDS work. The section titled “Methods and instruments” presents a comparison of six clustering algorithms: k-means, k-medoids, hierarchical clustering, DBSCAN, HDBSCAN, and OPTICS. This analysis aims to further selection of the most appropriate method for anomaly detection in NIDS based on their performance, accuracy, strengths, and limitations. 1. Methods and instruments This chapter presents a comparative analysis of clustering methods applicable to the stated problem (see Fig. 1). The analysis focuses on three main types of clustering methods [11]: 1. Partitioning clustering; 2. Hierarchical clustering; 3. Density-based clustering. To cluster network traffic into two categories this study evaluates the following methods: · Two partitioning clustering methods: k-means and k-medoids; · A hierarchical clustering method; · Three density-based clustering methods: DBSCAN, HDBSCAN, and OPTICS. 30 Computer science DCM&ACS. 2025, 33 (1), 27-45 Figure 1. Clustering methods 1. Partitioning clustering methods 1. The k-means clustering method The k-means method divides data into a predefined number of clustersAbout the authors
Aleksandr S. Baklashov
RUDN University; V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences
Email: 1132239133@pfur.ru
ORCID iD: 0009-0000-9046-3225
ResearcherId: KLZ-4503-2024
Master’s degree student Department of Probability Theory and Cybersecurity of RUDN University; Mathematician, V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences
6 Miklukho-Maklaya St, Moscow, 117198, Russian Federation; 65 Profsoyuznaya St, Moscow 117997, Russian FederationDmitry S. Kulyabov
RUDN University; Joint Institute for Nuclear Research
Author for correspondence.
Email: kulyabov_ds@pfur.ru
ORCID iD: 0000-0002-0877-7063
Scopus Author ID: 35194130800
ResearcherId: I-3183-2013
Professor, Doctor of Sciences in Physics and Mathematics, Professor of Department of Probability Theory and Cyber Security of RUDN University; Senior Researcher of Laboratory of Information Technologies, Joint Institute for Nuclear Research
6 Miklukho-Maklaya St, Moscow, 117198, Russian Federation; 6 Joliot-Curie St, Dubna, 141980, Russian FederationReferences
- Kosmacheva, I., Davidyuk, N., Belov, S., Kuchin, Y. S., Kvyatkovskaya, Y., Rudenko, M. & Lobeyko, V. I. Predicting of cyber attacks on critical information infrastructure. Journal of Physics: Conference Series 2091 (2021).
- Bhuyan, M. H., Bhattacharyya, D. K. & Kalita, J. K. Network Anomaly Detection: Methods, Systems and Tools. IEEE Communications Surveys & Tutorials 16, 303-336 (2014).
- Schynol, L. & Pesavento, M. Deep Unrolling for Anomaly Detection in Network Flows in (Dec. 2023), 61-65. doi: 10.1109/CAMSAP58249.2023.10403513.
- Maheswari, G., Vinith, A., Sathyanarayanan, A. S., Sowmi, S. M. & Sambath, M. An Ensemble Framework for Network Anomaly Detection Using Isolation Forest and Autoencoders. 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), 1-6 (2024).
- Olateju, O., Okon, S., Igwenagu, U., Salami, A., Oladoyinbo, T. & Olaniyi, O. Combating the Challenges of False Positives in AI-Driven Anomaly Detection Systems and Enhancing Data Security in the Cloud. Asian Journal of Research in Computer Science 17, 264-292. doi:10.9734/ ajrcos/2024/v17i6472 (June 2024).
- Lavanya, A. & Sekar, D. Traditional Methods and Machine Learning for Anomaly Detection in Self-Organizing Networks. International Journal of Scientific Research in Science, Engineering and Technology 10, 352-360. doi: 10.32628/IJSRSET2310662 (Dec. 2023).
- Sheela, S. N., Prasad, E., Srinath, M. V. & Basha, M. S. Intrusion Detection Systems, Tools and Techniques - An Overview. Indian journal of science and technology 8 (2015).
- Al-Ghamdi, M. An Assessment of Intrusion Detection System (IDS) and Data-Set Overview: A Comprehensive Review of Recent Works. Journal of Scientific Research and Development 5, 979- 982 (Feb. 2021).
- Rozendaal, K., Mailewa, A. & Dissanayake Mohottalalage, T. Neural Network Assisted IDS/IPS: An Overview of Implementations, Benefits, and Drawbacks. International Journal of Computer Applications 184, 21-28. doi: 10.5120/ijca2022922098 (May 2022).
- Satilmiş, H., Akleylek, S. & Tok, Z. A Systematic Literature Review on Host-Based Intrusion Detection Systems. IEEE Access PP, 1-1. doi: 10.1109/ACCESS.2024.3367004 (Jan. 2024).
- Mahfuz, N. M., Yusoff, M. & Ahmad, Z. Review of single clustering methods. IAES International Journal of Artificial Intelligence 8, 221-227 (2019).
- Burkov, A. Machine learning engineering (True Positive, Sept. 2020).
- Park, H.-S. & Jun, C.-H. A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications 36, 3336-3341. doi: 10.1016/j.eswa.2008.01.039 (2009).
- Campello, R., Kröger, P., Sander, J. & Zimek, A. Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10. doi: 10.1002/widm.1343 (Oct. 2019).
- Ankerst, M., Breunig, M. M., Kriegel, H.-P. & Sander, J. OPTICS: ordering points to identify the clustering structure. SIGMOD Rec. 28, 49-60. doi: 10.1145/304181.304187 (June 1999).
- Sahli, Y. Comparison of the NSL-KDD dataset and its predecessor the KDD Cup ’99 dataset. International Journal of Scientific Research and Management 10, 832-839. doi: 10.18535/ijsrm/v10i4.ec05 (Apr. 2022).
- L.Dhanabal & Shantharajah, D. S. P. A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms in. 4 (June 2015), 446-452.
- Kunhare, N. & Tiwari, R. Study of the Attributes using Four Class Labels on KDD99 and NSL-KDD Datasets with Machine Learning Techniques in (Nov. 2018), 127-131. doi: 10.1109/CSNT.2018.8820244.
- Gorban, A., Kégl, B., Wunsch, D. & Zinovyev, A. Principal Manifolds for Data Visualisation and Dimension Reduction, LNCSE 58 338 pp. (Jan. 2008).
Supplementary files










