contact@parthenonfrontiers.com

AUGMENTING WAZUH SIEM WITH MACHINE LEARNING FOR ADVANCED CYBER THREAT ANALYTICS

Authors

  • Dr. Ali R. Al-Harthy Department of Computer Science, Sultan Qaboos University, Oman Author
  • Dr. Hassan Ben Youssef Department of Computer Science, University of Tunis El Manar, Tunisia Author
  • Dr. Noura Al-Mutairi Department of Cybersecurity, King Abdullah University of Science and Technology (KAUST), Saudi Arabia Author

Keywords:

Wazuh, SIEM, Machine Learning, Threat Detection

Abstract

The escalating sophistication of cyber threats necessitates a paradigm shift from traditional, signature-based security measures to more dynamic, intelligent defense mechanisms. This article explores the enhancement of Wazuh, a widely adopted open-source Security Information and Event Management (SIEM) solution, through the integration of machine learning techniques. The primary limitation of rule-based systems, such as high false-positive rates and an inability to detect novel threats, is a significant challenge for modern Security Operations Centers (SOCs). This study proposes and evaluates a hybrid framework that integrates both supervised (K-Nearest Neighbors, Random Forest, Naive Bayes, Logistic Regression, Support Vector Machine) and unsupervised (DBSCAN, K-Means, Isolation Forest) machine learning models into the Wazuh detection pipeline. By leveraging algorithms such as these, this work demonstrates the potential to significantly improve threat detection rates, reduce false positives, and automate complex security event analysis. This study details a comprehensive framework for data collection in a simulated enterprise environment, extensive preprocessing and feature engineering, the application of various machine learning models for threat identification, and a rigorous comparative analysis of their performance. The findings indicate that the Random Forest classifier achieves a superior accuracy of 97.2%, while the DBSCAN algorithm demonstrates 91.1% accuracy in anomaly detection, significantly enhancing the quality of alerts. Furthermore, the real-world viability is assessed through latency and scalability testing, confirming that the proposed system can operate effectively within the stringent time constraints of a real-time SOC. This fusion of machine learning with Wazuh's robust monitoring capabilities offers a formidable, cost-effective, and scalable solution for organizations, particularly Small and Medium-sized Enterprises (SMEs), to bolster their cybersecurity posture against an evolving threat landscape. The article further discusses the practical implications, limitations, and future research directions, emphasizing the synergy between automated systems and human expertise within a modern SOC.

References

1. Chamkar, S.A.; Maleh, Y.; Gherabi, N. Security Operations Centers: Use Case Best Practices, Coverage, and Gap Analysis Based on MITRE Adversarial Tactics, Techniques, and Common Knowledge. J. Cybersecur. Priv. 2024, 4, 777–793.

2. Mokalled, H.; Catelli, R.; Casola, V.; Debertol, D.; Meda, E.; Zunino, R. The Applicability of a SIEM Solution: Requirements and Evaluation. In Proceedings of the 28th IEEE International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises, Naples, Italy, 12–14 June 2019.

3. Sheeraz, M.; Paracha, M.A.; Haque, M.U.; Durad, M.H.; Mohsin, S.M.; Band, S.S.; Mosavi, A. Effective security monitoring using efficient SIEM architecture. Hum.-Centric Comput. Inf. Sci. 2023, 13, 1–18.

4. Khayat, M.; Barka, E.; Serhani, M.A.; Sallabi, F.; Shuaib, K.; Khater, H.M. Empowering Security Operation Center with Artificial Intelligence and Machine Learning–A Systematic Literature Review. IEEE Access 2025, 13, 19162–19197.

5. Hughes, K.; McLaughlin, K.; Sezer, S. Dynamic countermeasure knowledge for intrusion response systems. In Proceedings of the 2020 31st Irish Signals and Systems Conference (ISSC), Letterkenny, Ireland, 11–12 June 2020; pp. 1–6.

6. Coscia, A.; Dentamaro, V.; Galantucci, S.; Maci, A.; Pirlo, G. Automatic decision tree-based NIDPS ruleset generation for DoS/DDoS attacks. J. Inf. Secur. Appl. 2024, 82, 103736.

7. Kinyua, J.; Awuah, L. AI/ML in Security Orchestration, Automation and Response: Future Research Directions. Intell. Autom. Soft Comput. 2021, 28, 527–545.

8. Sworna, Z.T.; Islam, C.; Babar, M.A. APIRO: A framework for Automated Security Tools API Recommendation. ACM Trans. Softw. Eng. Methodol. 2023, 32, 1–42.

9. Toyin, O.; Adeola, M.O.; Oguntimilehin, A.; OB, A.; Aweh, O.M.; Obamiyi, S.E.; Akinduyite, C.O.; James, A.A. Intelligent Network Intrusion Detection and Prevention System (NIDPS): A Machine Learning and Network Security. In Proceedings of the 2024 IEEE 5th International Conference on Electro-Computing Technologies for Humanity (NIGERCON), Ado Ekiti, Nigeria, 26–28 November 2024; pp. 1–6.

10. Kurnia, R.; Widyatama, F.; Wibawa, I.M.; Brata, Z.A.; Nelistiani, G.A.; Kim, H. Enhancing Security Operations Center: Wazuh Security Event Response with Retrieval-Augmented-Generation-Driven Copilot. Sensors 2025, 25, 870.

11. Manzoor, J.; Waleed, A.; Jamali, A.F.; Masood, A. Cybersecurity on a budget: Evaluating security and performance of open-source SIEM solutions for SMEs. PLoS ONE 2024, 19, e0301183. [PubMed]

12. Moiz, S.; Majid, A.; Basit, A.; Ebrahim, M.; Abro, A.A.; Naeem, M. Security and threat detection through cloud-based Wazuh deployment. In Proceedings of the 2024 IEEE 1st Karachi Section Humanitarian Technology Conference (KHI-HTC), Tandojam, Pakistan, 8–9 January 2024; pp. 1–5.

13. Vilendeˇci´c, B.; Dejanovi´c, R.; Curi´c, P. The Impact of Human Factors in the Implementation of SIEM Systems. ´J. Electr. Eng. 2017, 5, 196–203.

14. Chamkar, S.A.; Maleh, Y.; Gherabi, N. The Human Factor Capabilities in Security Operation Center (SOC). EDPACS 2022, 66, 1–14.

15. Mughal, A.A. Building and securing the modern security operations center (soc). Int. J. Bus. Intell. Big Data Anal. 2022, 5, 1–15.

16. Önal, V.; Arslan, H.; Görmez, Y. Machine Learning and Event-Based User and Entity Behavior Analysis. In Proceedings of the 2024 32nd Signal Processing and Communications Applications Conference (SIU), Mersin, Turkiye, 15–18 May 2024; pp. 1–4.

17. Karampudi, B.; Phanideep, D.M.; Reddy, V.M.K.; Subhashini, N.; Muthulakshmi, S. Malware Analysis Using Machine Learning. In Intelligent Systems Design and Applications; Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A., Eds.; Springer Nature: Cham, Switzerland, 2023; pp. 281–290.

18. Silic, M.; Delac, G.; Srbljic, S. Prediction of Atomic Web Services Reliability Based on K-means Clustering. In ESEC/FSE, Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Saint Petersburg, Russia, 18–26 August 2013; ACM: New York, NY, USA, 2013; pp. 70–80.

19. Laaksonen, J.; Oja, E. Classification with Learning K-nearest Neighbors. In Proceedings of the IEEE International Conference on Neural Networks, Washington, DC, USA, 3–6 June 1996; Volume 3, pp. 1480–1483.

20. Breiman, L. Random forests. In Machine Learning; Springer: Berlin/Heidelberg, Germany, 2001; Volume 45, pp. 1–33.

21. Rish, I. An Empirical Study of The Naive Bayes Classifier. In IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence; Washington, DC, USA, 2001; Volume 3, pp. 41–46.

22. Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422.

23. Schubert, E.; Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Database Syst. (TODS) 2017, 42, 1–21.

Downloads

Published

2024-12-29