Deduplication of error reports in software malfunction: Algorithms for comparing call stacks

Pavlenko, Serhii; Kuliabko, Petro

doi:10.62660/2306-4412.4.2023.59-69

Please use this identifier to cite or link to this item: https://er.chdtu.edu.ua/handle/ChSTU/4887

Title:	Deduplication of error reports in software malfunction: Algorithms for comparing call stacks
Other Titles:	Дедублiкацiя звiтiв про помилки в роботi програмного забезпечення: алгоритми порiвняння стекiв викликiв
Authors:	Pavlenko, Serhii Kuliabko, Petro
Keywords:	automatic monitoring;fault detection systems;duplication removal;computer failures;analysis of interacting context structures;автоматичний моніторинг;системи виявлення недоліків;усунення повторів;комп’ютерні збої;аналіз структури взаємодіючих контекстів
Issue Date:	2023
Publisher:	Вісник Черкаського державного технологічного університету. Технічні науки
Abstract:	In the software industry, the standard recognises automatic fault monitoring systems as mandatory for implementation. Considering the constant development of technologies and the high complexity of programmes, the importance of optimising processes for detecting and eliminating errors becomes a relevant task due to the need for reliability and stability of software. The purpose of this study is to conduct a detailed analysis of existing deduplication algorithms for reports from automatic systems collecting information about software failures. Among the algorithms considered were: the longest common subsequence method, Levenshtein distance, deep learning methods, Siamese neural networks, and hidden Markov models. The results obtained indicate a great potential for optimising processes of error detection and elimination in software. The developed comprehensive approach to the analysis and detection of duplicates in call stacks in failure reports allows for effectively addressing issues. The deep learning methods and hidden Markov models have demonstrated their effectiveness and feasibility for real-world applications. Effective methods for comparing key parameters of reports are identified, which contributes to the identification and grouping of recurring errors. The use of call stack comparison algorithms has proven critical for accurately identifying similar error cases in products with large audiences and high parallelism conditions. Siamese neural networks and the Scream Tracker 3 Module algorithm are used to determine the similarity of call stacks, including the application of recurrent neural networks (long short-term memory, bidirectional long short-term memory). Optimisation of report processing and clustering particularly enhances the speed and efficiency of responding to new failure cases, allowing developers to improve system stability and focus on high-priority issues. The study is useful for software developers, software development companies, system administrators, research groups, algorithm and tool development companies, cybersecurity professionals, and educational institutions. В індустрії системи автоматичного моніторингу збоїв у програмному забезпеченні визнані обов’язковим для впровадження стандартом. Враховуючи постійний розвиток технологій і високу складність програм, важливість оптимізації процесів виявлення та усунення помилок стає актуальним завданням завдяки потребі у надійності та стабільності програмного забезпечення. Мета даного дослідження полягає в детальному аналізі існуючих алгоритмів дедублікації звітів систем автоматичного збору інформації про збої у роботі програмного забезпечення. Серед розглянутих алгоритмів, були наступні: метод найдовшої спiльної пiдпослiдовності, відстань Левенштейна, методи глибинного навчання, сіамські нейронні мережі та метод прихованих марковських моделей. Отримані результати свідчать про великий потенціал оптимізації процесів виявлення та усунення помилок в програмному забезпеченні. Розроблений комплексний підхід до аналізу та виявлення дублікатів стеків викликів у звітах про збої дозволяє ефективно вирішувати проблеми. Використані методи глибинного навчання та прихованих марковських моделей проявили свою ефективність та можливість використання в реальних умовах. Зазначено ефективні способи порівняння ключових параметрів звітів, що сприяє ідентифікації та групуванню повторюваних помилок. Використання алгоритмів порівняння стеків викликів виявилося критичним для точного виявлення схожих випадків помилок у продуктах з великою аудиторією та умовами високої паралельності. Сіамські нейронні мережі та алгоритм Scream Tracker 3 Module використовуються для визначення подібності стеків викликів, зокрема, застосовуються рекурентні нейронні мережі (long short-term memory, bidirectional long short-term memory). Оптимізація обробки та кластеризації звітів значно підвищує швидкість та ефективність реагування на нові випадки збоїв, дозволяючи розробникам удосконалити стабільність системи та зосередитися на проблемах високого пріоритету. Дослідження корисне для розробників програмного забезпечення, компаній з розробки ПЗ, системних адміністраторів, дослідницьких груп, компаній з розробки алгоритмів та інструментів, фахівців у галузі кібербезпеки, а також освітніх установ.
URI:	https://er.chdtu.edu.ua/handle/ChSTU/4887
ISSN:	2306-4412 (print) 2708-6070 (online)
DOI:	10.62660/2306-4412.4.2023.59-69
Volume:	28
Issue:	4
First Page:	59
End Page:	69
Appears in Collections:	том 28, №4/2023

Files in This Item:

File	Size	Format
8.pdf	893.84 kB	Adobe PDF	View/Open
зміст.pdf	144.6 kB	Adobe PDF	View/Open
титул.pdf	216.63 kB	Adobe PDF	View/Open

Show full item record

ChSTU repository

ChSTU repository preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets