Інформаційна технологія визначення корисних даних при оптимізації структури та мінімізації обсягів вузла розподіленої БД

Дворецький, Михайло Леонідович; Дворецька, Світлана Володимирівна; Давиденко, Євген Олександрович

doi:10.24025/2306-4412.4.2019.184808

Please use this identifier to cite or link to this item: https://er.chdtu.edu.ua/handle/ChSTU/2229

Title:	Інформаційна технологія визначення корисних даних при оптимізації структури та мінімізації обсягів вузла розподіленої БД
Other Titles:	Information technology for determining useful data while optimizing the structure and minimizing the volume of the distributed database node
Authors:	Дворецький, Михайло Леонідович Дворецька, Світлана Володимирівна Давиденко, Євген Олександрович
Keywords:	розподілена транзакція;система керування базами даних;розподілена база даних;розподілений SQL-запит;реплікація даних;парсинг тексту;дерево парсингу;профайлінг;задача класифікації;нейронна мережа;інтелектуальний аналіз даних;ANTLR;OLAP;distributed transaction;database management system;distributed database;distributed SQL-query;data replication;text parsing;parse tree;profiling;multidimensional analysis;classification task;neural network;data mining
Issue Date:	2019
Publisher:	Вісник Черкаського державного технологічного університету. Технічні науки
Abstract:	Дослідження ставить на меті підвищення рівня загальної доступності даних в окремому вузлі розподіленої бази даних та ефективності використання програмних систем по роботі з даними за рахунок зменшення кількості розподілених запитів. Мета досягається шляхом оптимізації структури вузла розподіленої бази даних та мінімізації обсягів даних, що зберігаються у ньому. Для досягнення мети було створено підсистему обліку користувацьких запитів, граматику мови T-SQL та виконано парсинг коду SQL-запитів. В результаті запити класифікуються за списком таблиць бази даних, які трапляються у запиті, а також, після виконання більш детального аналізу, за списком атрибутів та кортежів відношення. Останнє досягається за рахунок виконання набору запитів зі зверненням до первинного ключа кожного відношення, що входить до складу запиту. Виконання повного аналізу оцінки корисності атрибутів та кортежів таблиць бази даних є досить ресурсоємною операцією, тому не може виконуватися при кожній зміні даних. У рамках дослідження запропоновано реалізувати вирішення задачі класифікації нових даних із використанням нейронної мережі прямого поширення, що навчається на базі оцінених попередньо даних на базі парсингу SQL-запитів. Враховуючи необхідність виконання аналізу накопичених даних з точки зору множинності вимірів, а також, ймовірно, великі їх обсяги, було виконано представлення даних, необхідних для аналізу, у вигляді багатовимірної моделі. The paper deals with the tendency to move from "universal" accounting systems to specialized solutions usage. This requires the synchronization of distributed database data. It is noted that among the strategies of data distribution between distributed database nodes, the combined one is the most justified, but the main disadvantage consists in the existence of distributed transactions when handling data. The research aims to improve the general availability of data in the separate node of the distributed database and the efficiency of using software systems to work with database data by reducing the number of distributed requests. The goal is achieved by optimizing the structure of the distributed database node and minimizing the amount of data stored in it. To achieve the goal, users' query accounting subsystem and T-SQL grammar have been created, and SQL query code has been parsed. As a result, the queries are classified by the list of database tables that are found in the query, and, after performing more deyailed analysis, by the list of attributes and relation tuples. The last one is achieved by executing a set of queries with getting the primary key of each relation included in the query. Performing the complete analysis of the database tables attributes and tuples estimation is a very resource-intensive operation, so it cannot be performed every time the database data is changed. The research proposes to solve the problem of classification of new data by using the perceptron, which learns on the basis of pre-evaluated data based on SQL query parsing. Also, according to the need of performing the analysis of received data from the point of view of multiple dimensions, as well as probably their large amount, the data required for the analysis has been presented in the form of a multidimensional model
URI:	https://er.chdtu.edu.ua/handle/ChSTU/2229
ISSN:	2306-4412 2306-4455
DOI:	10.24025/2306-4412.4.2019.184808
Issue:	4
First Page:	26
End Page:	35
Appears in Collections:	№4/2019

Files in This Item:

File	Description	Size	Format
6.pdf	Дворецький	798.62 kB	Adobe PDF	View/Open
зміст.pdf		367.83 kB	Adobe PDF	View/Open
титул.pdf		336.59 kB	Adobe PDF	View/Open

Show full item record

ChSTU repository

ChSTU repository preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets