Continuous feedback loops: Online fine-tuning of LLMs with user signals

Shvets, Sofiia; Швець, Софія

doi:https://doi.org/10.62660/bcstu/3.2025.106

Please use this identifier to cite or link to this item: https://er.chdtu.edu.ua/handle/ChSTU/9062

Title:	Continuous feedback loops: Online fine-tuning of LLMs with user signals
Other Titles:	Безперервні цикли зворотного зв’язку: онлайн-налаштування LLM за допомогою сигналів користувача
Authors:	Shvets, Sofiia Швець, Софія
Keywords:	adaptive relearning;generative transformers;dynamic model adaptation;Python implementation;hybrid learning;quality assessment metrics;language model stability;адаптивне перенавчання;генеративні трансформатори;адаптація динамічної моделі;реалізація на Python;гібридне навчання;метрики оцінки якості;стабільність мовної моделі
Issue Date:	2025
Publisher:	Вісник Черкаського державного технологічного університету
Abstract:	The intensive growth in the use of real-time language models requires mechanisms for their dynamic adaptation to changes in queries, terminology, and user expectations. The study aimed to investigate approaches to continuous feedback-based retraining of large language models. To achieve this goal, the theoretical and structural-functional modelling of the adaptation architecture, experimental implementation of the language model retraining cycle with processing and classification of different types of feedback, and quantitative evaluation of the results using automatic and user metrics were applied. The results of the study showed the effectiveness of the architecture of continuous online learning, which ensures the relevance and stability of the language model in real time. The study determined that implicit feedback is 4-10 times more common than explicit feedback, but explicit feedback gives a higher increase in the accuracy of answers. The proposed system successfully integrated different types of user signals, providing dynamic generation of training examples and hybrid relearning while maintaining the quality and consistency of the results. The Python software cycle for adaptive retraining of the language model involved processing and filtering user signals to form a high-quality buffer of training pairs. After 500 retraining steps on 52,912 query-response pairs, a significant improvement of the model was observed, which was confirmed by a decrease in the loss function from 3.82 to 3.15 and stability of the fine-tuning process without signs of overtraining. The results of the pre-training showed a moderate improvement in the quality of answers after adaptation: lexical similarity according to the Recall-Oriented Understudy for Gisting Evaluation was 0.102, accuracy according to the Bilingual Evaluation Understudy was 0.006, and subjective user satisfaction increased to 0.24, while maintaining the stability of the model with an average cosine similarity value of 0.396. The approach proposed in this study improves the quality and relevance of real-time responses of language models while maintaining their stability and can be used in productive systems to improve user experience. Інтенсивне зростання використання мовних моделей реального часу вимагає механізмів їх динамічної адаптації до змін у запитах, термінології та очікуваннях користувачів. Метою дослідження було вивчення підходів до перенавчання великих мовних моделей на основі безперервного зворотного зв’язку. Для досягнення цієї мети було застосовано теоретичне та структурно-функціональне моделювання архітектури адаптації, експериментальну реалізацію циклу перенавчання мовної моделі з обробкою та класифікацією різних типів зворотного зв’язку, а також кількісну оцінку результатів за допомогою автоматичних та користувацьких метрик. Результати дослідження показали ефективність архітектури безперервного онлайннавчання, яка забезпечує актуальність та стабільність мовної моделі в реальному часі. У дослідженні визначено, що неявний зворотний зв’язок зустрічається в 4-10 разів частіше, ніж явний зворотний зв’язок, але явний зворотний зв’язок дає вищий приріст точності відповідей. Запропонована система успішно інтегрує різні типи користувацьких сигналів, забезпечуючи динамічну генерацію навчальних прикладів та гібридне перенавчання, зберігаючи при цьому якість та узгодженість результатів. Програмний цикл Python для адаптивного перенавчання мовної моделі включав обробку та фільтрацію користувацьких сигналів для формування високоякісного буфера навчальних пар. Після 500 кроків перенавчання на 52 912 парах запит-відповідь спостерігалося значне покращення моделі, що підтверджувалося зменшенням функції втрат з 3,82 до 3,15 та стабільністю процесу точного налаштування без ознак перенавчання. Результати попереднього навчання показали помірне покращення якості відповідей після адаптації: лексична подібність за даними Recall-Oriented Understudy for Gisting Evaluation становила 0,102, точність за даними Bilingual Evaluation Understudy – 0,006, а суб’єктивна задоволеність користувачів зросла до 0,24, зберігаючи при цьому стабільність моделі із середнім значенням косинусної подібності 0,396. Підхід, запропонований у цьому дослідженні, покращує якість та релевантність відповідей мовних моделей у реальному часі, зберігаючи їх стабільність, і може бути використаний у продуктивних системах для покращення користувацького досвіду.
URI:	https://er.chdtu.edu.ua/handle/ChSTU/9062
ISSN:	2306-4412 (print) 2708-6070 (online)
DOI:	https://doi.org/10.62660/bcstu/3.2025.106
Volume:	30
Issue:	3
First Page:	106
End Page:	120
Appears in Collections:	том 30, №3/2025

Files in This Item:

File	Size	Format
зміст.pdf	161.04 kB	Adobe PDF	View/Open
титул.pdf	234.55 kB	Adobe PDF	View/Open
11.pdf	3.65 MB	Adobe PDF	View/Open

Show full item record

ChSTU repository

ChSTU repository preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets