THE METHOD OF MULTIVARIATE STATISTICAL ANALYSIS OF THE TIME MULTIVARIATE CRITICAL QUALITY ATTRIBUTES OF MANUFACTURE PROCESS WITH THE DATA FACTORIZATION

Context. This paper presents a method for solving the problem of product’s quality assurance at the stage of the initial manufacture process design in accordance with the process-analytical technology for the design of modern certified manufacturing – QbD. The method uses the information technologies of multivariate statistical analysis (MSA) to evaluate the influence of time multivariate critical process parameters (CPPs) on the time product critical quality attributes (CQAs). Preparatory transformation of clusters of critical process (manufacture process) parameters into factors of product critical quality attributes was carried out. Objective. To disclose the method of multivariate statistical analysis for assessing the character and features of the influence of time multivariate critical process parameters on time multivariate critical quality attributes at the design stage of the manufacture process. Method. The method consistently uses: statistical procedures of exploratory multivariate data analysis; transformation the homogeneous observed values matrices of CPPs and product CQAs into data frame (table) with factorized data; construction the regression trees of multivariate CPPs with a multivariate responses (CQAs). The method is implemented the R language packages software. Results. Factorized time multivariate CPPs make it possible to use methods of multivariate statistical analysis for evaluating the influence of CPPs factors on the time multivariate CQAs. Conclusions. This method of statistical analysis, together with statistical multivariate canonical analysis, represents an up-to-date information technology for detailed estimation the influence of time multivariate CPPs objects and some CPPs components on CQAs.


ABBREVIATIONS
CCA is a canonical correspondence analysis; CMAs is a critical material attributes; CPPs is a critical process parameters; CQAs is a critical quality attributes; DoE is a design of experiment; MBSE is a model based systems engineering; MRT is a multivariate regression trees; MSA is a multivariate statistical analysis; NDI is a non-development item; NMDS is a nonmetric multidimensional scaling; PAT is a process analytical technology; PCA is a principal component analysis; PDCA is a plan-do-check-act cycle; QbD is a quality-by-design; QFD is a quality function deployment; RDA is a redundancy analysis; SEMMA is a sample, explore, modify, model, assess TPQP is a target product quality profile; VAT is a visual assessment of cluster tendency; VS is a variables selection. q v X is a q m × ( 1, 2, ..., q n = ) sample from the multivariate population at the v-th attribute.

INTRODUCTION
Market conjuncture and increased rivalry objectivity set the task of developing and improving methods for ensuring high quality products in stage of the initial manufacture process design with further management of the entire production cycle (PDCA, [1]) in accordance with the TPQP [2].
One of the actual ways to solve this problem is the QbD methodology [2,3]. This methodology shifts the emphasis of product quality assurance towards information technology of statistical research and attentive experiment planning at the stage of product development and manufacture process design. The QbD concept is a significant transformation in product quality regulation from an empirical process to a more scientific and risk-based approach. The QbD concept combines categories under the general term "critical quality attributes": CMAs, CPPs and CQAs with TPQP. In turn, TPQP determines the quantitative parameters of the product safety and efficiency [2] and allows them to be compared with the requirements of standards, as shown in Fig. 1.
The dependence of product CQAs on CPPs and CQAs is considered as a quality function: This quality function is structured according to the main stages of the QFD [4] manufacture process, for which critical quality attributes are determine and QbD principles are applied, as shown in Fig. 1.
As an important tool for QbD, PAT [5,6] is considered, which actively uses MSA information technologies to provide and maintain critical quality attributes within the standards requirements in the design space. The results of the data multivariate statistical analysis of the multifactor experiment at the design stage make it possible to establish the permissible variability ranges of CPPs and CMAs based on the extent of influence on CQAs.
It should be noted that with the strict requirements of standards (ISO 9001 [7], ISO 22000 [8]) for CMAs, the influence of CPPs on CQAs is more often considered [5]. In this case, the time multivariate statistical data CPPs generally have a group/object influence on CQAs and therefore need a software factorization of the time multivariate computer format data.
The object of the study is the process of product's quality assurance at the stage of the initial manufacture process design.
In accordance with the abovementioned, the task of products quality assurance at the stage of the initial manufacture process design is concretized as the definition of CPPs and their objects, which significantly affect on CQAs.
The solution of this problem is proposed to implement by consistently application of the following MSA methods: -Defining the structure and hierarchy of time multivariate data: CPPs and CQAs; -Defining of qualitative and quantitative measures of relations between the formed objects (factors) CPPs and CQAs; -Comparison of the results of factor and component analysis of the influence of time multivariate CPPs on time multivariate CQAs.
Together with computer-intensive information technologies of statistical canonical analysis [9], the factorized multivariate data of critical quality attributes provide additional principal opportunities for multivariate statistical analysis of the affect of CPPs on CQAs.
The method of Genichi Taguchi is known for solving this problem by fractional and factored planning of the experiment with subjective test sets of design parameters and noise factors. These data form the matrix experiments using orthogonal arrays [10]. According to the effects of noise factors, orthogonal design parameters are evaluated for robustness and analyze possible losses of the quadratic quality function.
In contrast to this decision, it was suggested to apply MSA and computer-intensive information technologies for carrying out a statistical full-factor experiment with direct factorization of each of the observation tables after the exploratory data analysis. At the same time, statistical estimates of multivariate factor analysis are adequately supplemented by estimates of the component affect of CPPs on CQAs.
The subjects of studies are informational technologies for estimating the factor influence of manufacture CPPs on product CQAs. The proposed method of MSA of the time multivariate critical quality attributes of manufacture process with the data factorization is should be considered as the evolution of the QbD approach of product quality assurance at the design development stage. The QbD concept and method are aimed at an experimental study of the influence degree of CPPs on CQAs.

PROBLEM STATEMENT
The statistics of the experiment operates with data from object-oriented n observations formed into two arrays of random multivariate variables X and Y, X Y → .
A random variable X potentially determines the properties of the random variable Y: . The category X Y → is described by the quality function of the manufacture process: Formally, the results of monitoring (sampling) of random multivariate values of X and Y are data of the QbD development space, but do not represent a categorized information space of relations X Y → .
To determine and characterize the causal relations in the information space of the variables X and Y, it is necessary to inquire into the structure and hierarchy of observational data X and Y, as well as the qualitative and quantitative measures of the relations between them.
The problem of solving this task is that empirical arrays of X and Y data have a computer format in the form of time interrelated matrices that are not ready for issue. To conduct multivariate statistical analysis, the X and Y data arrays must undergo a preparatory processing step in the form of statistical procedures of exploratory data analysis, clustering, factorization and interpretation of the results [11]. In practice, observations of exogenous variables are not offline, but are groups of related CPPs. These groups form clusters with an interpreted dependence of variables for which it is of interest to estimate the influence on the multivariate responses of the manufacture process ( j Y ). It is suggested to cluster the time random multivariate data CPPs ( j X ) and identify them with the influence factors on the multivariate responses of the manufacture process CQAs ( j Y ). These operations make it possible to determine the potential trends and relationships between factorized observations of variables X and an array of multivariate responses Y, and also allow MSA to be used to research dependence X Y → as a quality function: 2 REVIEW OF THE LITERATURE The QFD process methodology [4,12], the methods of assurance the TPQP with the detailed QbD [2,5,13] and the risk-oriented process approach [4,14,3] -are the dominants of modern quality management systems [7].
High requirements to product quality and market competition -lead to the need of products quality assurance at the QbD development stage [13,5] at an decomposition of the quality function (QFD) [4]. An adequate tool for QbD are the PAT methods [6,15], which are based on information MSA technologies.
At the design stage, the PAT methods are focused on the research of multifactorial relationships between CPPs and CQAs, exogenous factors and their influence on the on the final product quality [12,16].
In [17], an actual systematic review of a large number of references on the topics of applications of MSA methods is given. The known methods for determining the nominal values of CQAs, CPPs and permissible variation for them offer statistical solutions to the multivariate VS. The VS statistical methods (essentially MSA methods) research the contribution of exogenous factors to the variability of the quality function values (a multivariate endogenous variable).
Thus, MSA methods create an adequate reasoned basis for transformation functional requirements into design parameters and for justification of design parameters permissible variation [18]. The adequate CPPs and CMAs are determined on the basis of an reasonable (standards-oriented) change in the characteristics of the quality function and the sensitivity of this function to critical quality attributes.
This problem can be considered from the position of system engineering [19,20,21], when models play a key role at one or several stages of the development process and are part of the process design by the MBSE [22]. The main stages of research and design of the product/process from the general requirements to the system [21] within the background of robust design [18] inevitably lead to the architecture (structure + logic) of the experiment [16,21,23]. One of the general trends of this logic is the decomposition of the quality function (QFD) [3,4,18] using the DoE technology [19], similar to the methodology of careful experiment planning (QbD) [2,3,5] with PAT tools [6].
It is also important to note one of the features of modern process design, which is the use of NDI technologies (for example, software packages of MSA methods on R language) [24]. At the design and experimentation level, this direction is getting widespread use in connection with the intensive introduction of information technologies of objectoriented MSA and the high confidence of engineers in the developed and tested software packages.
Thus, the review of literature enables to emphasize the following features of the task: -the definition of the requirements for methods of technological approximation to the specified values of CQAs attributes, as a some quality function of the CPPs values, is made at the stage of the initial manufacture process design; -the guaranteed and certified product quality at the hierarchical level of the initial manufacture process design is achieved by computer-intensive information MSA technologies of manufacture process and products in the multivariate space of design parameters.

MATERIALS AND METHODS
To ensure the standardized quality of products on the initial manufacture process design, the task is to determine from the data of n observations the m CPPs - The cluster analysis methods combine samples j X of the observable data set into groups of objects that are in some sense homogeneous and called clusters or classes, as shown in Fig. 2.
Suppose that each of the observations j X is given as . The set of these points can be treated as a -the number of observations corresponding to the v-th attribute) from the multivariate population at the v-th attribute.
A number of observations of the q-th attribute form an object (cluster), which is a factor of the homogeneous affect of q observations on CQAs. Some observations j Y are grouped into objects by this attribute by conducting a constrained ordination. Based on the results of such an analysis, a diagram is constructed -an "ordination three plot", which shows the variability of the CQAs with relation to the two CCA1-CCA2 (canonical correspondence axis) projection axes and the statistical relationships with each critical process parameter - The research used partitioning algorithms of the R language that decompose the data set of n observations into s clusters with previously unknown parameters. This algorithms searches for centroids, i.e. the centers of points concentration with the minimum scatter within each cluster as far as possible from each other and with the verification of clustering quality [11].
The interrelation between the components of multivariate data of X and Y arrays was established under the multivariate statistical canonical analysis with the means of software packages of the R language [11]. In particular, the mathematical apparatus of multivariate canonical ordination was used: RDA, CCA and NMDS.
These methods are positioned as an extension of multivariate statistical regression analysis in modeling the multivariate response of the manufacture process ( j Y ) to the multivariate value of the critical process parameter Thus, the sequence of performing statistical procedures for clustering and factorization of multivariate data CPPs and CQAs creates a model of a statistical fullfactor experiment that is realized by methods of multivariate statistical causal analysis.

EXPERIMENTS
In the case under consideration, the experiment assumes the study of cause-effect or causal dependencies.
To determine the causal relationships of multivariate 4c. The formation of the MRT of an exogenous X variables array with a multivariate responses Y.
5. The estimation the MSA results of the relations between an array of exogenous variables X and an array of multivariate responses Y.
In general, this algorithm corresponds to the SEMMA methodology [26] for determining physical regularities at statistical analysis. SEMMA focuses on structuring processes (such as QFD), as well as applying statistical analysis and visualization of results at each stage of the design process and gives the researcher the freedom to choose the concept and implementation of explore.

RESULTS
The results correspond to the multivariate statistical analysis of time-data arrays of predictors X and responses Y specified in the form of uniformly distributed random values of the design space within the ranges specified in standards.
The exploratory data analysis was used for descriptive statistics of sample properties. This analysis included: processing and systematization of empirical data, visual presentation of multivariate data in the form of tables and graphs, as well as quantitative characteristics of the basic statistical parameters. Estimation of the weight or variable importance of predictors j X and their tendency to clustering, taking into account the low multicollinearity, was evaluated using the principle components analysis and VAT, as shown in Fig. 3a.
The search for an optimal scheme for combining predictors into clusters was performed by permutation of multisets at various combinations of the groups number, distances metrics and clustering methods using 30 quality parameters (indices).
Based on the results of the calculations, the optimal number of clusters was chosen, as shown in Fig. 3b.
The operation of transforming homogeneous arrays of numerical variables X and Y into tables with factor (cluster) attributes was performed by identifying the factor with the cluster of observations j X and corresponding group of multivariate responses j Y . As a result, tables (data.frame) with factorized data arrays X and Y were formed, as shown in Fig. 2.
Unlike unconstrained ordination, constrained ordination allows to connect variability of multivariate values of responses j Y with affect of predictors j X . In Fig. 4 shows the ordination of the multivariate response components Y on the principal components plane (unconstrained ordination) and the distribution of these values over the objects of the X array factors on the principal components plane (constrained ordination). An unconstrained ordination analysis of responses matrix Y makes it possible to obtain a mapping in the orthogonal coordinate system of only some particular structural features of the research data array in the form of graphic projections of the components of multivariate response j Y on the principal components plane, as shown in Fig. 4a.
In the diagram of Fig. 4b shows the grouping of observations by factor characteristics on the background of the explanatory components m x (dashed lines) associated with the eigenvectors of the X array (ordination three plot). Note, that the use of PCA-ordinance is a method of smoothing random data fluctuations and contributes to the construction of more stable (robust) cluster structures. The practice of direct ordination allows combining the operations of hierarchical formation of objects on the separating predictors of the X array with the projection of multivariate data of responses array Y into the space of the principal components.
To assess the component relationship of two arrays of multivariate variables X and Y, used statistical canonical ordination procedures: CCA and NMDS.
To represent the results of NMDS, when the least distorted distance metric between the two-dimensional The results demonstrate the great potential of MSA at the analysis of time multivariate data for a practical application of products quality assurance at the stage of the initial manufacture process design.

DISCUSSION
The article presents the results of the research at the design stage of the manufacture process, during which a multivariate statistical analysis of the cause-effect relationships between the variability of the critical quality attributes and the action of certain factors identified with the critical process parameters. The study contains elements of exploratory, descriptive and analytical statistical analysis.
The evaluation of the m x predictors variable importance and the tendency to clustering the X array, taking into account the possible multicollinearity of the components m x , were estimated using the РСА-method and visualization of the trends formation of clusters (VAT). The PCA-method yields the relative expected contribution of each component m x (Com.m) to the common data variation, as shown in Table 1.
Note that all components of a multivariate X array are characterized by an almost equal contribution (proportions of variance 16.6%) to the expected distribution and together they give a common contribution of 100%. The clustering tendency available in this test is characterized by Hopkins statistics (0.5087), which shows a moderate tendency to form clusters at a random distribution of data. The criterion for the effectiveness of clusterization is based on the metric hypothesis of compactness. If it is possible to find such a division of observations into groups that the distances between observations from one group (intra-cluster distances) will be less than a definite value 0 ε > , and between observations from different groups (cross-cluster distance) -bigger than ε and therefore clustering is successful.
According to this criterion, the results of assessing the quality of clustering according to the Zagoruiko test for clusters of an array of multivariate data X were obtained (Fig. 3b), as shown in Table. 2. In Table 2 on the main diagonal are located average intra-cluster distances, which are smaller than the cross-cluster distances (off-diagonal elements of the table), that indicates the possibility of division the value space of an X array into two clusters. The operation of transforming a homogeneous table of endogenous variables observations Y into a table with factor variables was carried out after research the characteristics of this array.
The results of the projection in the principal components orthogonal coordinate system of some features of the internal relation of the multivariate response array components Y (unconstrained ordination) are shown in Table 3. From Table 3  The NMDS method is most applicable for data arrays with a significant influence of random or unaccounted factors, as in the case under consideration.
The results of application of the NMDS method are summarized in Table 4. x of the data array X. Note, that with increasing number of components, as well as with a weak tendency to clustering of X and Y array values, the analysis becomes more complicated.
At the final stage of the analysis, an MRT of an array of exogenous variables X with a multivariate response Y was formed, as shown in Fig. 6. For this MRT, "leaves" are clusters of objects Y with minimal differences between points in a multivariate space within each cluster. Note, that in order to explain the effect of variables m x on responses j Y the MRT software package has the option of selecting a various number of clusters. For this MRT (Fig. 6), the dividing variable is the component x6, which forms equivalent clusters with an almost equal number of observations (47 and 53) at the absence of explicit dominance of the components k y .
The discussion shows that the experimental data of the computer format require processing, analysis and interpretation procedures that determine the potential trends and dependencies between unit components and objects of the time multivariate data for the purpose of practical interpretation at the stage of the initial manufacture process design.

CONCLUSIONS
The idea of design a quality function of the manufacture process at each of QFD stages in order to obtain predictable values of the attributes of the product critical quality at variability of the critical process parameters values within the tolerances is realized using the multivariate statistical analysis methods.
The scientific novelty of obtained results is that the MSA method uses the preparatory transformation of clusters of critical process parameters of manufacture process into factors of product critical quality attributes is carried out.
This method is used under statistical experiments with multivariate arrays of critical quality attributes of computer format to products quality assurance at the stage of the initial manufacture process design.
The researches results are consequence of combining the causal methods of multivariate statistical analysis that allow to obtaining the numerical characteristics with an adequate practical interpretation of the influence the critical process parameters on the product critical quality attributes.
The prospect of further research consists in the synthesis or application of robust design methods to provide standard values of product critical quality attributes that are resistant to the affects of various environmental factors and to the variability of critical process parameters. КЛЮЧЕВЫЕ СЛОВА: качество через дизайн, атрибуты критического качества, критические параметры процесса, дизайн эксперимента, многомерный статистический анализ.