MATHEMATICAL MODEL FOR DECISION MAKING SYSTEM BASED ON THREE-SEGMENTED LINEAR REGRESSION

Context. The problem of approximation of empirical data in the decision-making system in safety management.. The object of the study was to verify the adequate coefficients of the mathematical model for data approximation using information technology. Objective. The goal of the work is the creation adequate math-ematical model using information technology on the bases analyze different approaches for approximating empirical data an that can be used to predict the current state of the operator in the flight safety system.. Method. A comparative analysis of the description of the transformation of information indicators with a non-standard structure. The following models of transformation of information indicators with similar visual representation are selected for comparison: parabolas of the second and third order, single regression and regression with jumps. It is proposed to use new approaches for approximation, based on the use of the criterion proposed by Kuzmin and the Heaviside function. The adequacy of the approximation was checked using these criteria, which allowed to choose an adequate mathematical model to describe the transformation of information indicators. The stages of obtaining a mathematical model were as follows: determining the minimum sum of squares of deviations for all information indicators simultaneously; use of the Heaviside function; optimization of the abscissa axis in certain areas; use of the linearity test. The obtained mathematical model adequately describes the process of transformation of information indicators, which will allow the process of forecasting changes in medical and biological indicators of operators in the performance of professional duties in aviation, as one of the methods of determining the human factor in a proactive approach in flight safety. Results. The results of the study can be used during the construction of mathematical models to describe empirical data of this kind. Conclusions. Experimental studies have suggested recommending the use of three-segment linear regression with jumps as an adequate mathematical model that can be used to formalize the description of empirical data with non-standard structure and can be used in practice to build models for predicting operator dysfunction as one of the causes of adverse events in aviation. Prospects for further research may be the creation of a multiparameter mathematical model that will predict the violation of the functional state of the operator by informative parameters, as well as experimental study of proposed mathematical approaches for a wide range of practical problems of different nature and dimension.


INTRODUCTION
Statistical analysis, which uses as empirical data information indicators from unstable objects is one of the most difficult tasks [1,2]. The use of statistical analysis is closely related to the use of mathematical statistics, spectral analysis, regression and variance analysis, splines, applied geometry, etc. [3][4][5]. However, the use of empirical data obtained from the transformation of information indicators of unstable objects often cannot be adequately described using standard methods that are widely used, for example, for stable technical objects.
The object of study is the process of approximation of information indicators to determine an adequate model that can be used in the decision-making process on the functional state of operators The process of deciding on the state of the object takes a long time due to the need to analyze the information indicators of different informativeness and is often subjective. Therefore, to reduce the time of decisionmaking and increase its reliability requires the development of a mathematical model that adequately describes the state of the object on the information indicators.
The subject of study is the process of modeling information indicators that can be used to predict changes in the functional state of the operator as one of the triggers of an adverse event in the aviation security management system.
Due to the variability, unpredictability and instability of the object from which the information indicators are obtained, there are many factors influencing the correct choice of the transformation model and the accuracy of estimating the coefficients of the mathematical model. This is due to the fact that such information indicators have structural features and therefore cannot be adequately described by standard methods, therefore, for adequate mathematical modeling it is necessary to approximate the experimentally obtained data using new non-standard approaches. The subject of the study is the process of modeling with non-standard approaches to the approximation of empirical data..
The purpose of the work is to increase the hang the reliability of the use of mathematical methods to describe the functional state change of the operator using transformation information indicator in decision-making system.

PROBLEM STATEMENT
The purpose of the EPAS (the European Plan for Aviation Safety (EPAS) -Safety Management at European Level) is to ensure that the principles of safety management are applied within the European Aviation Community so as to continually improve safety performance. Using the Regulation (EU) 2018/1139, known as the European Union Aviation Safety Agency (EASA) Basic Regulation, are fundamental to the continuous improvement of civil aviation safety [6]. It to ensure the application of ICAO safety management principles. The EPAS seeks to anticipate emerging industry safety risks and make best use of technical, mathematical resources and information technology for planning and implementing safety improvement actions. EASA develops the EPAS in close collaboration with the Member States and other relevant stakeholders and is being developed annually and looks ahead to the following four years. There are determined prioritisation of issues and evaluates options to address them on based relevant safety information sources (notably occurrences). It identifies the main areas of concern affecting the European aviation safety system, one such problem is the influence of the human factor [7] (Fig. 1).
The ability of a person to make erroneous or illogical decisions in specific situations, which is referred to as a human factor is associated with limitations or errors that are characteristic of any person moreover psychophysiological characteristics of a operator do not always correspond to the level of complexity of the tasks or problems to be solved. The resolution adopted by the ICAO Assembly on flight safety and the role of the human factor in the interaction with equipment, processes, the environment and interaction with each other is to identify features of human factor assessment as a source of risk [8]. This assessment should be based on a practical solution to safety problems, based on the analysis of erroneous actions of all participants in the human-machine system, which led or could lead to accidents [8]. The goal of a pro-active approach to the human factor is to minimize aviation accidents through erroneous or illogical decisions in non-standart situations human fault. In this regard, the ideology of risk management is developed, which directs the search for ways and determines the early detection of hazards and dangerous factors that occur in the form of certain events, predictors of Fig. 1. [9]. Given that due to a number of circumstances due to the complexity of the process to quickly identify hazards associated with the human factor, proactive approach to assessing the current functional state of the operator will allow to reduce the development of adverse events. The use of mathematical methods to describe and formalize the current functional state of the operator will allow on the basis of the obtained mathematical model to predict the imbalance of the operator, which can cause erroneous decisions. But the mathematical description of such a complex object has a number of difficulties [10]. Often, empirical data obtained in the process of measuring various information indicator describing the current state of the operator rarely have a pronounced pattern and are characterized by a limited amount of data, which complicates the formalization process.
Thus, in [23], researchers used a mathematical model of two linear regression lines for one data set, but the peculiarity of this use was the fact that empirical data were clearly divided into clusters. Based on the research conducted in [23], the author developed a method for constructing two regression lines, but this method has a number of technical difficulties in its practical use for engineering problems.
In books [15,16] the authors solve the problem of constructing a multistage regression using a fictitious variable. The peculiarity of this study is that the author considers an option in which there is only one jump. While often empirical data may contain a certain number of jumps. Therefore, this article solves the urgent scientific, technical and practical problem of constructing a multisegmented regression with jumps with optimization of the jump point abscissa and using the Heaviside function to obtain the general equation.

MATERIALS AND METHODS
The functional purpose of the decision-making system is to ensure the maximum completeness of obtaining information about the parameter of the object by controlled physical quantities correlated with this parameter.
The fact that the primary information after transformation takes the form of quantitative judgments about the state of the object does not prevent us from considering any multi-parameter diagnostic system as an information system.
If we consider a complex object characterized by the Y parameter, then X 1 ,…, X k are measured physical quantities that reflect the properties of a physical object. Often, Y must be considered as a random variable in a certain sense, which is due to the lack of the possibility of an accurate, metrologically justified reproduction of its given value in the range of A y of all its possible changes. However, the dispersion of the Y value at any point in the range is a constant value ( const 2   y ).
Moreover, at the functional level, there is an a priori unknown relationship between the mean value Y and {X i }: In addition, for any of the controlled values there is a for any of the controlled values there is a conditional density, which reflects the stochastic relationship between the X i value and the remaining controlled values, provided that All quantities belong to the set of real numbers, and their number is theoretically considered unlimited, although for technical reasons, the condition k < ∞ takes place.
The generalized decision-making structure based on the transformation of measurement information about the Y value based on the measurement of the values of controlled quantities, Fig. 2. According to the structure in Fig. 2, in order to make a decision about the state of a complex object, it is necessary at the first stage to obtain the value X 1 *,...X k * by primary converters, which at the second stage are converted into an estimate Y* of the value of the parameter Y: ,... ,...
The initial entropy depends on the distribution density f(y) of the value Y in the range A y : We find the conditional entropy by the conditional while the result of the solution y j gives the value Y= j Y : ) estimation of the coefficients 0 a … k a to give an adequate mathematical model based on empirical data. However, empirical data that can be used to assess the functional state of the operator often have structural features and therefore cannot be adequately described by standard methods.

EXPERIMENTS
The construction of a mathematical model was based on empirical data. When it comes to empirical data obtained from biological objects, obtaining the amount of data necessary for reliable statistical processing is always difficult. Therefore, the use of methods that show good predictive performance on small samples are necessary.
Also, finding mathematical approaches tested for adequacy based on real data is the main step in the decision-making system. Empirical data were used in the work, which described the dependence of systolic blood pressure and anthropometric characteristics, among which body weight was chosen. The measurements were carried out in male operators of the age group from 25 to 30 years. The measurement results are presented in Table 1. Based on the measured values, a graph was representation for visual interpretation of their results   Fig. 3, after visual analysis of the initial data, the following conclusion can be drawn. The distribution of the initial data has a non-standard form, which causes difficulties in their description. Such a description of a single functional dependence with satisfactory accuracy is hampered by the presence of several clusters, which are clearly visible in Fig. 3. (indicated by a dash-dotted line).
As can be seen from Fig. 3, one of the clusters is shifted up relative to the linear trend, which forces to choose non-standard approaches for approximation. One such approach is to use the Heaviside function together with the criterion proposed by Kuzmin [25]. Kuzmin's criterion [25] is used to determine nonlinearity, but in this paper the authors used it to test the mathematical model for adequacy.
Several hypotheses have been put forward in the paper to determine the most adequate mathematical model.

RESULTS
Hypothesis 1. Using a simple linear regression. The least squares method for approximating the function of one variable was used, which allowed to obtain the following regression equation: Visualization of the obtained regression equation to approximate the data is shown in Fig. 5.  The nonlinearity test conducted in this work provided an opportunity to conclude that it is impossible to approximate the data in Table 1 using a single regression line. A sequence of deviations of the data from the line found by the least squares method was found.
The cumulative residual curve is defined as the sum of deviations of data from the line obtained by the method of least squares. Visualization of the results is presented in Fig. 6. Features of the use of the cumulative curve are given in [25]. Calculation results: range of cumulative residual curve: 101.7; relative range (range to standard deviation ratio): 11.4. Using the table of critical values [26], the limit value is 10.44 (for a probability of 0.99). Carrying out calculations makes it possible to draw a conclusion about the nonlinearity of data and the impossibility of their approximation by a single linear regression: that is, Hypothesis 1 is rejected. Hypothesis 2. Using the second order parabola. The second hypothesis suggests the possibility of using as a mathematical model to approximate the data of the parabola function of the second order: where 0 a , 1 a , 2 a are unknown coefficients of the mathematical model.
Using empirical data using the least squares method, the unknown coefficients of the mathematical model were determined, which allowed the equation to be written as follows: The results of the second-order parabolic approximation are shown in Fig. 7.
As can be seen from Fig. 7, the curve tends to decrease after passing the extremum, while this contradicts the nature of the change in empirical data. An analysis of the results obtained indicates the need to build a more accurate approximation. Hypothesis 2 is rejected. Hypothesis 3. Using the third order parabola. An analysis of the results obtained in the study of the second hypothesis allows us to put forward a new hypothesis about the possibility of using a third order parabola to describe empirical data: where 0 a , 1 a , 2 a , 3 a are unknown coefficients of the mathematical model.
Similarly, the unknown coefficients of approximation are found. As a result, we obtain the equation The results of the third ordera parabolic approximation are shown in Fig. 8. Visual analysis of the graph in Fig. 8 shows the discrepancy between the obtained approximating curve and the empirical data. This discrepancy is especially manifested at the beginning and at the end of the graph, which does not correspond to the physical properties of changes in systolic pressure. All this does not allow us to accept the hypothesis about the possibility of using the third-order parabola function as a mathematical model for describing the change in the values of the processed empirical data. The data approximation curve using the third-order parabola function is shown in Fig. 8 Hypothesis 4. Using of a three-segmented linear regression with jumps.
After carrying out the analysis and analysis of hypotheses 1..3, the authors is a precondition for choosing of a three-segment linear regression with lines for approximating empirical data.
As shown in Fig. 3 empirical data can be divided into three clusters. The first cluster is filled by points 1 to 9 (Table 1). But the second and third clusters have a questionable point with coordinates (200; 148). The importance of the correct hit of the point in the corresponding cluster significantly affects both the accuracy of the approximation and, accordingly, the quality of the forecast, which will be carried out according to the obtained mathematical model. To determine the location of this point in the corresponding cluster, the following calculations were performed.
Due to the lack of data for calculating the angle of inclination within each segment, it is assumed that these angles are the same, which simplifies the calculations when building a mathematical model.
Incorrect assignment of questionable point to any clusters can significantly affect the approximation accuracy, as well as the prediction quality. We make the assumption that the slope angles inside each segments of the poly-segmented regression are the same to simplify the calculations when constructing a mathematical model. General view of the equation of three-segment linear regression with jumps: 3 a are unknown coefficients of the mathematical model, jump1 x , jump2 x are abscissas of the jumps cross sections, is Heaviside function. The system of normal equations for finding the unknown approximation coefficients is given below: (   1  jump2  1  jump2  3  1  jump2  2  1  jump2  1  1  jump2  0   1  jump1  1  jump2  3  1  jump1  2  1  jump1  1  1  jump1  0   1  1  jump2  3  1  jump1  2  1   2  1  1  0   1  1  jump2  3  1  jump1  2 where n is the total quantity of empirical data. The solution of system of normal equations allows to determine unknown coefficients.
As can be seen from Fig. 3, the first cluster is clearly defined, and the abscissa of the cross section of the first jump jump1 x can be calculated as the sum of the two limit values of the first and second clusters: The stages of optimization calculations to determine the boundary between clusters are given below.
Step 1. The calculation of five options for the abscissa of the second jump. To do this, determine the limit of change of the abscissa axis. Based on the visual analysis, the boundary of the abscissa variation jump2 x change is Step 2. The calculation of the unknown coefficients of the regression model 0 a , 1 a , 2 a , 3 a for the values of obtained abscissa variation jump2 x in stage 1 is performed. The results of the calculations are shown in Table 2. Step 3. The sum of squares of deviations for each variant of three-level regression with jumps according to the following formula is calculated: The obtained data are given in Table 3. Step 4. The optimization of the data values from table 3 is performed on the basis of the least squares method [27]. The resulting equation has the form: The optimal abscissa of the second section of the jump has the following form (in the case when the first derivative of this equation is zero): It can be concluded that the 21st point (with coordinates (200; 148)) of empirical data belongs to the third cluster. This partition optimization technique can be usd as a special approach to data clustering.
As a result of using the usual least squares method, the final optimal three-segment linear regression with jumps is searched for. We obtain the final equation:  Fig. 9. A comparative analysis of the four approximation methods considered above is performed. The coefficient of adequacy and standard deviation values were calculated to determine a mathematical model that adequately describes the empirical data of Table 1 and can be used in the decision-making system.
Calculating the amplitude of the cumulative residual curve and the standard deviation is the relative maximum range, which is defined as the adequacy factor. Despite the fact that criterion [25] is used to verify the data on nonlinearity in this article, it was used for testing adequacy. was originally planned to test empirical data for non-linearity. However, studies have shown that in some cases it can be used for testing adequacy, it was done in this article.
The calculation results are given in Table. 4. As can be seen from table 4, linear regression with jumps has lower values of standard deviation and adequacy ratio: for example 3.6 and 4.05 compared to 7.5 and 8.4 (compared to single third order parabola).
It indicates that the special approximation proposed by the authors in the form of a three-segmented linear regression with jumps is the is a more reliable mathematical model for describing studied variant empirical data. At the same time have proposed a special approximation in the form of a trisegment linear regression with stripes, which is promising for selection in the formalization of the process of describing of biological parameters.

DISCUSSION
As can be seen from Fig. 1, real empirical data obtained from non-standard objects with stochastic influence are difficult to approximate by standard methods. So, using 3 methods of linear regression, it was not possible to achieve an adequacy coefficient below 8.3.
This gave impetus to finding standard approaches for approximating empirical data. The importance of a reliable approximation of empirical data obtained from real biological data is related to the content in these values of an important component that can serve as an indicator of a violation of the functional state of a biological object and plays an important role in making a decision about the state of the operator as a possible trigger in aviation safety management.
The use of standard methods leads to the formation of approximation functions, which often contradict the physical processes that form empirical data obtained from biological objects or inaccurately describe the pattern of their behavior (Fig. 5, Fig. 7, Fig. 8). This is due to the fact that the distribution of values obtained from biological objects often cannot be described by standard mathematical models, because the behavior of parameter values that describe biological objects is influenced by many factors of both external and internal nature. Therefore, the description of such complex objects must be carried out with the maximum similarity of the approximating curve to real values. Also, a feature of the empirical data of biological objects is their non-linearity. The use of data clustering methods also has its own difficulties in determining the belonging of doubtful points that can be attributed to different classes (suspected point Fig. 3).
Another reason may be that points located on the outer boundaries of classes, but not important for the separation of adjacent classes, can be recognized as individually significant if their belonging to classes is neglected. Point characterize separately the properties of an instance as informative with respect to external and internal boundaries which is their specific in tasks of visualization and data analysis.
It should also be noted that obtaining adequate coefficients of a mathematical model significantly affects its reliability. It has been established that an increase in the coefficients of mathematical models of various forms of approximation increases the value of the adequacy coefficient.
The methodology for calculating the information content indicators of individual specimens not only quantitatively, but also qualitatively affects the formed sample. Therefore, the paper presented a methodology for determining the boundaries of clusters, which should be performed taking into account the complexity of describing specific empirical data.
The closest analogue of the proposed method is the nonlinearity test proposed in [25]. In contrast to those proposed in this paper, this technique was also used to check for the adequacy of the approximation of the mathematical model. At the same time have proposed a special approximation in the form of a trisegment linear regression with stripes, which is promising for selection in the formalization of the process of describing of biological parameters.

CONCLUSIONS
The article is devoted to the approximation of empirical data. In general, four data approximation options were considered: a single linear regression, the second order parabola, the third order parabola, threesegmented linear regression with jumps.
Analysis of empirical data obtained from biological objects indicates their unusual structure, which cannot be correctly described by standard methods, which became a prerequisite for the use of non-standard approximating function -three-segment linear regression with jumps.
The equation for three-segment linear regression with jumps was obtained using the following methods: using the Heaviside function, determining the minimum sum of squares of deviations for the whole set of empirical data simultaneously, optimizing the abscissa cross section, using the linearity test.
The scientific novelty of obtained results is that the method of approximating empirical data was previously proposed, which characterizes the relevance of the biological parameters of the operator for the threesegment linear regression with lines. The use of this method allows us to take a more reliable result of the formalization of the fallibility of biological parameters, which was demonstrated on specific butts. The variation of the optimization of the position of the sections can serve as a new approach to the clustering of data and it allows to take into account the fallow of the empirical data. Data can be compared to change the parameter of the cardiovascular system according to changes in anthropometric data for the prognosis.
The calculations showed that the standard deviation and the adequacy coefficient for a three-segmented linear regression with jumps are more than two times better than other approximation methods. This corroborates the necessity for a three-segmented linear regression with jumps.
The results of the study can be used during the construction of mathematical models to describe empirical data of this kind.
The practical significance of obtained results is that the software realizing the proposed indicators is developed, as well as experiments to study their properties are conducted. The experimental results allow to recommend the proposed indicators for use in practice, as well as to determine effective conditions for the application of the proposed indicators.
Prospects for further research are to study the proposed set of indicators for a broad class of practical problems.