METHOD OF SPECTRAL CLUSTERING OF PAYMENTS AND RAW MATERIALS SUPPLY FOR THE COMPLIANCE AUDIT PLANNING

Context. The analytical procedures used in the audit are currently based on data mining techniques. The work solves the problem of increasing the efficiency and effectiveness of analytical audit procedures by clustering based on spectral decomposition. The object of the research is the process of auditing the compliance of payment and supply sequences for raw materials. Objective. The aim of the work is to increase the effectiveness and efficiency of the audit due to the method of spectral clustering of sequences of payment and supply of raw materials while automating procedures for checking their compliance. Method. The vectors of features are generated for the objects of the sequences of payment and supply of raw materials, which are then used in the proposed method. The created method improves the traditional spectral clustering method by automatically determining the number of clusters based on the explained and sample variance rule; automatic determination of the scale parameter based on local scaling (the rule of K-nearest neighbors is used); resistance to noise and random outliers by replacing the k-means method with a modified PAM method, i.e. replacing centroid clustering with medoid clustering. As in the traditional approach, the data can be sparse, and the clusters can have different shapes and sizes. The characteristics of evaluating the quality of spectral clustering are selected. Results. The proposed spectral clustering method was implemented in the MATLAB package. The results obtained made it possible to study the dependence of the parameter values on the quality of clustering. Conclusions. The experiments carried out have confirmed the efficiency of the proposed method and allow us to recommend it for practical use in solving audit problems. Prospects for further research may lie in the creation of intelligent parallel and distributed computer systems for general and special purposes, which use the proposed method for segmentation, machine learning and pattern recognition tasks.


ABBREVIATIONS
NJW is a Ng, Jordan, Weiss method; PAM is the partitioning around medoids; EM is an expectation-maximization; DBSCAN is a density-based spatial clustering of applications with noise; OPTICS is an ordering points to identify the clustering structure; DIANA is a divisive analysis; SOM is a self-organizing map; ART is a adaptive resonance theory; TP is a true positive; TN is a true negative; FP is a false positive; FN is a false negative. NOMENCLATURE A is a set of clustering objects; i a is an i -th object of clustering; n is a number of objects of clustering; X is an set of feature vectors from the space q R ; i x is a feature vector of i -th object of clustering from the space q R ; q is a number of features in feature vector x i ; X is a set of K -nearest feature vectors; i x is a feature vector of K -nearest to feature vector δ is a threshold for determining the number of clusters; K is a number of nearest neighbors; i σ is a scale parameter for the i -th feature vector; S is a symmetric similarity matrix; D is a diagonal degree matrix; L is a normalized symmetric Laplace matrix; I is a unit matrix; i λ is an i -th eigenvalue; i w is an i -th eigenvector; c is a number of clusters; 2 R is a coefficient of determination; V is a principal component matrix; X is a set of feature vectors from the space c R ; i x is an i -th feature vector from the space c R ; X is a set of feature vectors from the space c R , which not corresponding to medoids; k A is a k -th cluster; Λ is a set of indicator functions; ( ) k A χ ⋅ is an indicator function k A (returns 1 or 0 depending on the belonging of the object to the k -th cluster); ik D is a square of distance between i -th object and medoid of k -th cluster; ( ) F ⋅ is a target function; y is a best target function value; y is a target function value; M is a set of cluster centroids; k m is a centroid of k -th cluster for the space q R ; M is a set of medoids of cluster; i m is a medoid of k -th cluster for the space c R ; m is a preserved medoid for space c R ; (0,1) N is a function, that returns standard normal distributed random number; 2 v is a variance of Gaussian additive noise; Accuracy is an accuracy; Precision is a precision; Recall is a recall; F is a balanced F-measure;

INTRODUCTION
The analytical procedures used in the audit are currently based on data mining techniques [1,2]. In an automated audit system, the task of auditing expenses at the top level is decomposed into tasks of checking the sequence of displaying data of the middle level. First of which -display is paid-received. This is a mapping of the multidimensional data of payment for raw materials to suppliers to/in the set of multidimensional data for the delivery of raw materials. At the lower level, if there are no violations in accounting, this mapping should be one-to-one. In order to reduce the volume of checks at the lower level, the audit system analyzes the aggregated indicators of payment and delivery at the middle level or formed sets (clusters) of multidimensional data of the lower level. Also, when designing an IT audit, the goal is to automate the analysis to form recommended solutions. According to the method of generalized set mapping, at the middle level, generalized properties of data sets (condensation points, isolated points) are analyzed, that is, the density structure of each of the sets is determined, and then they are compared.
For analysis, pay and delivery sequence data can be aggregated over quantization periods: 1) for all suppliers; 2) by the nomenclature of raw materials. Analysis of data on payment and supply of raw materials is carried out to form recommended solutions for the following audit tasks.
1. The task of the external audit is to check the completeness of accounting for settlements with suppliers.
2. Tasks of internal audit to verify compliance with contractual policies. The contractual policy of the enterprise is a set of rules characterizing the delivery time after payment, the nomenclature of raw materials, technical or physical characteristics, prices (discounts).
3. The task of the internal audit of pricing policy when concluding contracts (identifying a significant share of unfavorable contracts, which are features of "kickbacks" when concluding them).
4. The task of internal audit of receivables from suppliers of raw materials in terms of timing and amounts.
Clustering methods are used to audit the compliance of the sequence of payments for raw materials and the sequence of deliveries of raw materials at the stage of identifying characteristic properties.
Object of study. Audit process for compliance with payment sequences and raw materials supply.
Subject of study. Spectral clustering method for auditing sequences of payment and supply of raw materials.
The aim of the work is to increase the effectiveness and efficiency of the audit by automating the analysis of data from sets of parallel-sequential operations of payment and supply of raw materials based on the spectral clustering method.
To achieve this goal, it is necessary to solve the following tasks: 1. Generate feature vectors for objects of sequences of payment and supply of raw materials.
2. Create a method for spectral clustering of sequences of payment and supply of raw materials.
3. Select characteristics for assessing the quality of spectral clustering.
4. Conduct a numerical study of the proposed spectral clustering method.

PROBLEM STATEMENT
The problem of increasing the efficiency of audit based on the method of spectral clustering of sequences of payment and supply of raw materials is presented as the problem of finding such a partition of the set of clustering

REVIEW OF THE LITERATURE
Existing clustering methods have one or more of the following disadvantages [6,7]: -have high computational complexity; -do not allow the emission of noise and random emissions; -clusters cannot have different shapes and sizes; -require specifying the number of clusters; -require the definition of parameter values. In this regard, it is relevant to create a clustering method that will eliminate the indicated disadvantages.
One of these methods is spectral clustering [13,14], which has already found application in the segmentation of signals of different physical nature [15]. Since initially the spectral clustering methods did not provide for the procedure for automating the determination of the parameters and the number of clusters, an attempt is being made to eliminate this drawback. [16], which will allow them to be used in IT audit of enterprises with different characteristics.

MATERIALS AND METHODS
Let's start by solving the first task formation of feature vectors for objects of sequences of payment and supply of raw materials.
The attributes for the objects of the sequence of payment and supply of raw materials are formed on the basis of the accounting variables of the lower level (Table 1), taking into account the possible options for generalizing their values at the average level for the periods of quantization. Clustering objects of payment (supply) for each supplier with which a long-term supply agreement is in force during the year for which the audit is carried out. To assess the dimension of the vector of attributes and the number of objects of analysis, an analysis of the nomenclature of purchases of raw materials (components) of large engineering enterprises. So, based on this analysis, we can conclude that the sections of the nomenclature are on average from 8 to 12, the number of groups in each section is from 2-10. Analyzing the homogeneity of the procurement nomenclature, we can conclude that for con-tinuous operation the plant can have long-term contracts with suppliers in the amount of 50 до 100.
Clustering will make it possible to form subsets of payment and supply operations that are similar in terms of the features highlighted above, which will allow analyzing the set of operations when comparing and reducing the computational complexity of solving the matching problem.
To form the rules for matching the sequences of payments and deliveries after clustering, it is necessary to select the rules of relationships. Based on the analysis of the terms of payment agreements and the supply of raw materials, the rules for recording these transactions in the system, the following rules were identified: 1) Delivery operations are carried out after payment, in accordance with the contractual policy of the enterprise in accordance with payment orders.
2) Delivery under a new payment order is not carried out until the previous one is closed.
3) Low-level delivery data that corresponds to one payment order is aggregated before clustering.
Let's move on to solving the second problema method creating for spectral clustering of sequences of payment and supply of raw materials (Fig. 1  14. Calculating the initial best value of the goal function 16.2. Extract from the set X next feature vector and assigning it the vector m k .

Calculating the squared distance between objects and medoids
18. Modification of a set of indicator functions 1, The following conditions must be met for indicator functions 19. Calculating the value of the goal function Fig . 1 shows the structure of spectral clustering of sequences of payment and supply of raw materials.
Let's move on to solving the third task -characteristics selecting for assessing the quality of spectral clustering. In the work, the following characteristics were chosen for assessing the quality of spectral clustering: In the work, the following characteristics were chosen for assessing the quality of spectral clustering:

EXPERIMENTS
A numerical study of the proposed spectral clustering method was carried out in the package MATLAB.
The work used a standard database of handwritten numbers digit1000 (http://www.stat.washington.edu/ spectral/datasets.html). There were 100 objects for each of the 10 digits, i.e. number of clustering objects 1000 n = . For each object, the length of the feature vector was 64 q = . Objects were noisy with additive Gaussian noise, i.e. added noise component 2 Threshold for determining the number of clusters 05 . 0 = δ , number of nearest neighbors 7 K = .

RESULTS
The function reflecting the dependence of the determination coefficient on the number of clusters is presented in the form  The dependence (Fig. 2) of the determination coefficient on the number of clusters shows that the determination coefficient increases with an increase in the number of clusters.
The results of comparison of the qualitative characteristics of the proposed method with the NJW method described in [13] are presented in Table 1. The results of comparison of the quantitative characteristics of the proposed method with the NJW method described in [13] are presented in table 2.

DISCUSSION
The selected values of the parameters of the proposed spectral clustering method provide high accuracy of clustering.
Traditional NJW Spectral Clustering Method [13]: -requires specifying the number of clusters; -scale parameter required; -is not robust to noise and random outbursts (instead of the k-means method, a modified PAM method is used, i.e. centroid clustering is replaced by medoid clustering).
The proposed method eliminates the indicated disadvantages (table 2).
In terms of accuracy, precision, completeness, balanced F-measure, the proposed method is more effective than the NJW method (table 2).

CONCLUSIONS
The urgent task of increasing the effectiveness and efficiency of the audit was solved by creating a method of spectral clustering of sequences of payment and supply of raw materials.
The scientific novelty of obtained results is that the method of spectral clustering. It improves the quality of clustering due to: -automatic determination of the number of clusters based on the explained and sample variance rule; -automatic scaling parameter based on local scaling; -resistance to noise and random outliers by replacing the k-means method with a modified PAM method, i.e. replacing centroid clustering with medoid clustering.
The practical significance of obtained results is that the proposed method makes possible to expand the scope of clustering methods based on spectral decomposition, which is confirmed by its adaptation for the audit task, and contributes to increasing the efficiency of intelligent computer systems for general and special purposes.
Prospects for further research are the study of the proposed method for a wide class of artificial intelligence tasks, as well as the creation of a method for matching payment and delivery sequences after clustering to solve audit problems.