Подовження рядів даних за значеннями показників схожих рядів

Батурінець, Анастасія; Антоненко, Світлана

doi:10.24025/2306-4412.3.2021.244266

Please use this identifier to cite or link to this item: https://er.chdtu.edu.ua/handle/ChSTU/3699

Full metadata record

DC Field	Value	Language
dc.contributor.author	Батурінець, Анастасія	-
dc.contributor.author	Антоненко, Світлана	-
dc.date.accessioned	2022-02-04T12:33:53Z	-
dc.date.available	2022-02-04T12:33:53Z	-
dc.date.issued	2021	-
dc.identifier.issn	2306-4412 (print)	-
dc.identifier.issn	2708-6070 (online)	-
dc.identifier.uri	https://er.chdtu.edu.ua/handle/ChSTU/3699	-
dc.description.abstract	Проблема недостатності інформації суттєво впливає на вибір підходів та методів аналізу рядів даних, а також на якість отримуваних результатів. Зважаючи на таку проблему, автори роботи вважають, що актуальним є питання розробки й аналізу підходів та моделей для подовження рядів даних. Основною задачею є описання та реалізація технології подовження рядів даних. В основу реалізації технології закладено використання значень схожих рядів даних як ознак для подовження певного ряду даних, представленого тими ж показниками, що й схожі ряди даних. В роботі описано схему визначення схожих рядів даних. Згідно з цією схемою найбільш схожими рядами даних є такі, що мають найменше значення відстані та сильний прямий кореляційний зв’язок, обчислені між потенційно схожим рядом та рядом, для якого буде відбуватися подовження. Для подовження ряду розглядаються сім моделей. За результатами обчислювального експерименту встановлено, що найкращі результати отримано при використанні двох моделей: суми зважених значень по групі схожих рядів та середньозважених значень по групі схожих рядів, з коригуванням на середнє значення ряду, для якого виконується подовження. В результаті проведеного аналізу можна дійти висновку про можливість використання розробленої технології для вирішення задачі подовження рядів даних. При подальших дослідженнях планується використання отриманих результатів для розробки та аналізу методів поповнення пропущених значень у часових рядах.	uk_UA
dc.description.abstract	he problem of insufficient information essentially influences the choice of approaches and methods of data series analysis, as well as the quality of the obtained results. Considering this problem, the authors believe that the development of such approaches and models for data series lengthening is relevant. The main task of this work is to describe and implement the technology of data series lengthening. The basis for the implementation of the technology is the use of values of similar data series as a signs for the lengthening of a certain data series represented by the same indicators, as well as similar data series. The work describes a scheme for identifying similar data series. According to this scheme, the most similar data series are those that have the smallest distance value and the strongest direct correlation, calculated between the potentially similar series and the series for which the lengthening will take place. For lengthening of the series, the work considers seven models: linear regression; sum of weighted values for a group of similar series; average weighted values for a group of similar series, with a correction to the average value of the series for which the lengthening is performed; random forest; k-nearest neighbors; support vector regression; gradient busting. The calculation experiment was carried out on the series represented by the values of water level indicators recorded at hydrological stations located in the water objects of the Dnieper River basin. For the data series of post 79545, located on the river Sluch, Novograd-Volynsky, Zhytomyr region, a lengthening by one year is carried out, i.e. the length of the series increases by 365 values. As a result, it was found that the most similar are the data series of values by the posts 79555 and 79694, which have the lowest values of the calculated distances and the value of the correlation coefficient greater than 0.75. When the series is lengthened, the best results are obtained with the use of two models: the sum of weighted values for a group of similar series and average weighted values for a group of similar series, with a correction to the average value of the series for which the lengthening is performed. In future research it is planned to use the obtained results for the development and analysis of methods for replenishment of missing values in time series.	uk_UA
dc.language.iso	uk	uk_UA
dc.publisher	Вісник Черкаського державного технологічного університету. Технічні науки	uk_UA
dc.subject	часові ряди	uk_UA
dc.subject	регресія	uk_UA
dc.subject	поповнення даних	uk_UA
dc.subject	машинне навчання	uk_UA
dc.subject	недостатність даних	uk_UA
dc.subject	гідрологія	uk_UA
dc.subject	sklearn	uk_UA
dc.subject	time series	uk_UA
dc.subject	regression	uk_UA
dc.subject	data replenishment	uk_UA
dc.subject	machine learning	uk_UA
dc.subject	insufficient data	uk_UA
dc.subject	hydrology	uk_UA
dc.title	Подовження рядів даних за значеннями показників схожих рядів	uk_UA
dc.title.alternative	Lengthening the data series by values of similar data series samples	uk_UA
dc.type	Article	uk_UA
dc.citation.issue	3	uk_UA
dc.citation.spage	78	uk_UA
dc.citation.epage	86	uk_UA
dc.identifier.doi	10.24025/2306-4412.3.2021.244266	-
Appears in Collections:	№3/2021

Files in This Item:

File	Size	Format
1-2_титул 3-2021.pdf	271.63 kB	Adobe PDF	View/Open
3-4_Зміст 3-2021.pdf	139.26 kB	Adobe PDF	View/Open
78-86_Батурінець_Антоненко.pdf	1.01 MB	Adobe PDF	View/Open

Show simple item record

ChSTU repository

ChSTU repository preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets