ejecip Open Access Journal

European Journal of Emerging Cybersecurity and Information Protection

eISSN: Applied
Publication Frequency : 2 Issues per year.

  • Peer Reviewed & International Journal
Table of Content
Issues (Year-wise)
Loading…

Open Access iconOpen Access

ARTICLE

A STATIC ANALYSIS FRAMEWORK UTILIZING LARGE LANGUAGE MODELS FOR IDENTIFYING MALICIOUS OFFICE OPEN XML FILES

1 School of Computing and Information Systems, University of Melbourne, Australia
2 Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russia

Citations: Loading…
ABSTRACT VIEWS: 58   |   FILE VIEWS: 54   |   PDF: 54   HTML: 0   OTHER: 0   |   TOTAL: 112
Views + Downloads (Last 90 days)
Cumulative % included

Abstract

Office Open XML (OOXML) documents represent a primary vector for malware distribution, capitalizing on their ubiquitous presence in modern enterprise and personal computing. The inherent complexity of the OOXML format provides a fertile ground for concealing malicious payloads, which often evade traditional security measures. Conventional detection methods, which predominantly rely on signature-based scanning and predefined rules, are frequently outpaced by the rapid evolution of malware, particularly sophisticated threats like polymorphic code, zero-day exploits, and advanced social engineering tactics. This paper proposes a novel, in-depth static analysis framework that leverages the advanced contextual understanding and reasoning capabilities of Large Language Models (LLMs) to unmask malicious OOXML documents. Our methodology involves a systematic deconstruction of the OOXML package into a structured, human-readable JSON format. This comprehensive representation is then fed to an LLM, which, guided by a sophisticated, role-based prompt, performs a deep semantic analysis of the document’s constituent parts. The model scrutinizes everything from VBA macro code and XML relationship files to embedded objects and metadata for indicators of malicious intent. This approach transcends the limitations of simple pattern matching, enabling a holistic assessment of the document's structure and content. The framework demonstrates a high potential for accurately identifying malicious documents, including those that employ heavy obfuscation or novel attack vectors, thereby offering a significant and necessary advancement in the ongoing fight against document-based cyber threats.


Keywords

Pulmonary blastoma, Biphasic tumor, Lung neoplasm, Case report

References

1. Microsoft Office Statistics: Latest Data & Summary. 2024. Available online: https://wifitalents.com/statistic/microsoft-office/ (accessed on 20 June 2024).

2. Macros from the Internet Are Blocked by Default in Office. 2024. Available online: https://learn.microsoft.com/en-us/deployoffice/security/internet-macros-blocked (accessed on 20 June 2024).

3. The Beginner’s Guide to—OOXML Malware Reverse Engineering Part 1. 2024. Available online: https://bufferzonesecurity.com/the-beginners-guide-to-ooxml-malware-reverse-engineering-part-1/ (accessed on 16 June 2024).

4. How to Analyze Malicious Microsoft Office Files. 2025. Available online: https://intezer.com/blog/malware-analysis/analyzemalicious-microsoft-office-files/ (accessed on 2 April 2025).

5. A Distribution of Exploits Used in Attacks by Type of Application Attacked, May 2020. 2024. Available online: https://securelist.com/kaspersky-security-bulletin-2020-2021-eu-statistics/102335/#vulnerable-applications-used-by-cybercriminals (accessed on 17 June 2024).


How to Cite

A STATIC ANALYSIS FRAMEWORK UTILIZING LARGE LANGUAGE MODELS FOR IDENTIFYING MALICIOUS OFFICE OPEN XML FILES. (2024). European Journal of Emerging Cybersecurity and Information Protection, 1(01), 1-13. https://parthenonfrontiers.com/index.php/ejecip/article/view/80

Related articles

Share Link