Utilising large language models for automated real-time cyber threat analysis

Kovalchuk, Denys; Ковальчук, Денис

doi:https://doi.org/10.62660/bcstu/1.2025.48

Please use this identifier to cite or link to this item: https://er.chdtu.edu.ua/handle/ChSTU/5869

Title:	Utilising large language models for automated real-time cyber threat analysis
Other Titles:	Використання великих мовних моделей для автоматизованого аналізу кіберзагроз у режимі реального часу
Authors:	Kovalchuk, Denys Ковальчук, Денис
Keywords:	natural language processing;machine learning for security;phishing attack detection;anomaly detection;deep learning in cybersecurity;neural networks for security;cyber threat intelligence;обробка природної мови;машинне навчання для безпеки;виявлення фішингових атак;виявлення аномалій;глибоке навчання у кібербезпеці;нейронні мережі для безпеки;розвідка кіберзагроз
Issue Date:	2025
Publisher:	Вісник Черкаського державного технологічного університету
Abstract:	In the contemporary cybersecurity landscape, where the rapid growth in the quantity and complexity of threats has undermined the effectiveness of traditional rule- and signature-based detection methods, an urgent need has emerged for automated cyber threat analysis systems employing large language models. The objective of this study was to investigate the capabilities of large language models for automated cyber threat analysis, risk assessment, and improving incident response efficiency in corporate environments. To achieve this goal, machine learning and natural language processing techniques were employed, particularly the adaptation of large language models for threat classification, risk-level evaluation, and anomaly detection. A system was developed to analyse incoming and outgoing email communications, which during testing automatically identified phishing attacks and social engineering techniques, assigned risk scores to messages, and quarantined those exceeding a predefined threshold (e.g., 0.8) for further inspection. The system analysed a dataset of 100,000 emails, of which 70% were legitimate communications and 30% were phishing attacks. Additionally, real-time analysis of data streams from corporate logs and external sources enabled the detection of potential cyber incidents with an accuracy of up to 94%, while reducing the false-positive rate to 6.5%. The obtained results confirmed the efficacy of large language models, which achieved a threat classification accuracy of up to 97% with an F1-score of 95% and reduced incident response times by 30-40%. These findings can be leveraged by other researchers to refine phishing detection techniques, reduce false positives in corporate security systems, and integrate machine learning models with diverse data sources, including SIEM systems and external cybersecurity resources. У сучасному ландшафті кібербезпеки, де стрімке зростання кількості та складності загроз позначилося на ефективності традиційних методів виявлення, базованих на правилах та сигнатурах, було встановлено нагальну потребу у впровадженні автоматизованих систем аналізу кіберзагроз із застосуванням великих мовних моделей. Метою роботи було дослідити можливості великих мовних моделей для автоматизованого аналізу кіберзагроз, оцінки ризиків та підвищення ефективності реагування на інциденти в корпоративному середовищі. Для досягнення поставленої мети використовувалися методи машинного навчання та обробки природної мови, зокрема адаптація великих мовних моделей для класифікації загроз, оцінки рівня ризику та виявлення аномалій. Було розроблено систему аналізу вхідних та вихідних повідомлень електронної пошти, яка під час тестування автоматично ідентифікувала фішингові атаки та техніки соціальної інженерії, присвоювала повідомленням ризиковий бал і при перевищенні порогового значення (наприклад, 0.8) направляла їх у карантин для подальшої перевірки. Система аналізувала датасет із 100 000 електронних листів, з яких 70 % становили безпечні повідомлення, а 30 % – фішингові атаки. Крім того, здійснювався аналіз потоків даних із корпоративних логів та зовнішніх джерел, що дозволяло виявити потенційні кіберінциденти з точністю до 94 % та знизити відсоток хибнопозитивних спрацьовувань до 6,5 %. Отримані результати підтвердили ефективність застосування великих мовних моделей, які забезпечували точність класифікації загроз до 97 % із F1-мірою до 95 % і скорочували час реагування на інциденти на 30-40 %. Отримані результати можуть бути використані іншими дослідниками для покращення методик виявлення фішингових атак, зниження кількості помилкових спрацьовувань у корпоративних системах безпеки та інтеграції моделей машинного навчання з різними джерелами даних, включаючи SIEM-системи та зовнішні ресурси з кібербезпеки.
URI:	https://er.chdtu.edu.ua/handle/ChSTU/5869
ISSN:	2306-4412 (print) 2708-6070 (online)
DOI:	https://doi.org/10.62660/bcstu/1.2025.48
Volume:	30
Issue:	1
First Page:	48
End Page:	58
Appears in Collections:	том 30, №1/2025

Files in This Item:

File	Size	Format
титул.pdf	269.53 kB	Adobe PDF	View/Open
зміст.pdf	123.86 kB	Adobe PDF	View/Open
6.pdf	422.02 kB	Adobe PDF	View/Open

Show full item record

ChSTU repository

ChSTU repository preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets