MODULE HIGH-EFFICIENCY MULTIPROCESSOR SYSTEM WITH MULTIDIMENSIONAL AGGREGATING OF CHANNELS OF NETWORK INTERFACE

Context. In modern terms problem of constructing of the multiprocessor systems the special value acquires the base of standard popular technologies and components. It is caused by that such systems became popular and cheap vehicle platforms for high-performance calculations. In addition, practice pulls out problems complete decision of which in most cases possibly only due to application of high-performance calculations. Consequently, a theme of constructing of the cluster multiprocessor systems for today is actual, interesting and is on the stage of the active development. At the same time, the new high-quality stage of development of the multiprocessor cluster systems lies in area of the use of new modern network technologies. Presently the problem of choice and analysis of network technologies for the module multiprocessor cluster systems did not get due development, as well as problem of reorganization of structure ofnetwork interfaceby aggregating of channels of network interface. Objective. An aim is in-process put improvement of structure and increase of the productivity of the multiprocessor computer system by the multidimensional aggregating of channels of network interface, adapted to the decision of tasks of the investigated class. Method. The task of increase of efficiency of the module multiprocessor computer system is decided due to multidimensional aggregating of channels of network interface. Offered approach allowed not only to promote efficiency of parallelization but also substantially to decrease time of calculations. Such results succeeded to be attained due to diminishing to time of border exchange of data between the calculable knots of the cluster system. Results. A feature offered approach is that he allowed to realize a direct exchange data between main memory of knots of the multiprocessor system, that promotes the fast-acting of calculations and provides high-speed access to memory of her slave -nodes. Thus during an exchange by data between the knots of the system the system CPU gets unloaded and loading of channel which passes between the knots of the computer system goes down, that assists diminishing of time of border exchange of data between the calculable knots of the system. Conclusions. The results of the conducted experiments showed that the worked out multiprocessor system was used for creation of new technological processes. So, she is used in a fluidizer intensification of the сфероидизируещего annealing of long-length steelwork. Directly the technological process of heat treatment of metal acquires such advantages, as a high yield, substantial mionectic energy consumption and allows to carry out control of technological parameters in the modes of unisothermal treatment of metal.


INTRODUCTION
Today there is a swift height of number of the multiprocessor computer systems and their total productivity in the world. It is caused by that such systems became popular and cheap vehicle platforms for high-performance calculations. In addition, practice pulls out before applied scientists of different problems complete decision of which in most cases possibly only due to application of the multiprocessor computer systems. For today there are many different variants of construction of the module multiprocessor computer systems. One of perspective is a process of constructing of such systems on the basis of "blade"-technologies. One of basic features of their constructing is constrained with the use of network technology the choice of which depends foremost, from the class of the tasks decided by users. Consequently, there is a problem of design of architecture of the multiprocessor module computer systems, sent to the decision of wide circle of the applied tasks.
Consequently, a theme of constructing of the cluster multiprocessor systems for today is actual, interesting and is on the stage of the active development. Clear and other, by means of the high-performance module systems the effective method of implementation of actual tasks of class was found.
At the same time, the new high-quality stage of development of the multiprocessor cluster systems lies in area of the use of new modern network technologies. Thus efficiency of necessary on this stage parallelization of calculations is predefined by many factors among which decision are a choice and organization of network interface.
A research object are informative computational processes in the multiprocessor computer systems.
The article of research is conception of construction of the new module multiprocessor computer systems on the basis of reorganization of structure of network interface, decision-oriented certain class of tasks.
An aim to perfect a structure and promote the productivity of the multiprocessor system of calculations by aggregating of channels of network interface, adapted to the decision of tasks of the investigated class is in-process put.

PROBLEM STATEMENT
Today, production practice raises various problems, the complete solution of which in most cases is possible only through the use of multiprocessor computing complexes. Solving applied problems with the help of wellknown standard approaches is a complex problem that can only be overcome by using modern multiprocessor computer technologies. At the same time, one of the main features of the application of such technologies is to increase the speed and productivity of computers. High computer performance allows you to solve multidimensional tasks, as well as tasks that require a large amount of processor time. Speed makes it possible to effectively manage technological processes or, in general, to create prerequisites for the development of new promising technological processes.
When designing new multiprocessor systems, the following must be taken into account. Modern HDR expansion technology allows data exchange between computing nodes at a speed of up to 200 Gbit/s. On the other hand, according to the manufacturer's data, HDR4 technology has a delay of 0.4-0.5μs. In this case, it is necessary to increase the efficiency and speed of the multiprocessor system by means of multidimensional aggregation of network interface channels. At the same time, data exchange between computing nodes must be transferred to a separate network that works at the channel (second) level using channel bonding technology. Such an approach will be aimed at increasing the speed of data exchange between cluster nodes and reducing the load on the channel that connects the cluster nodes. At the same time, due to the use of HDR4x2 technology, the speed of data ex-change will increase from 200 Gbit/s. up to 400 Gbit/s. On the other hand, due to the separation of aggregated channels for the symmetrical use of controllers with aggregated components, the latency will decrease to 0.1μs.
An Intel Core I5 processor was used to design the multiprocessor system (using DDR4 memory type with RDMA support, which allows the combination of RAM ports up to 4 channels), which allows direct access to memory with support for InfiniBand technology. Such an approach will reduce the congestion of the channel that passes between the nodes of the computing system. In addition, the use of RDMA technology in the system will make it possible to eliminate the delay in sending data directly to the InfiniBand adapter.

REVIEW OF THE LITERATURE
In the modern terms of the special weight acquires creation of the multiprocessor computer systems on the base of standard popular technologies and components [1−3]. Due to high demand and supply on bladed configuration in scientific practice a "blade"-cluster calculable complex is offered exactly for the decision of tasks with the up-diffused area of calculations [4,5]. Becomes clear that through high-performance clusters the effective method of decision of wide class of actual tasks is found.
In the process of planning of the module multiprocessor system his construction [has a large value 6]. Exactly on the stage of constructing of the multiprocessor system it is necessary to foresee possibilities of her expansion or modification in the future [7]. We will mark that the most successful decision, placing of the multiprocessor system is considered in a bar [8]. Such arrangement appeared appropriate even for the small computer system. Into a bar there are knots, apparatus for effective connection of components, management facilities by the intranet of the system and under. Every blade works under the management of the copy of the standard operating system. Composition and power of knots of the multiprocessor system can be разным. In hired the homogeneous system is examined. The interaction between the nodes of the computing system is established with the help of specialized libraries at the level of the OS kernel and switching hardware.
However at planning and effective use of the multiprocessor system basic attention is spared to the interconnect network of the system and her topology [9,10]. Topology of cluster and his fast-acting at the decision of calculable tasks, undoubtedly, problems are constrained.
To our opinion, the new high-quality stage of development of the multiprocessor cluster systems lies in area of the use of new modern network technologies [11]. Thus efficiency of parallelization of calculations depends on many factors, however one of qualificatory are a choice and organization of network interface. It is explained as follows. The network of the cluster computer system fundamentally differs from the network of the work stations, although for the construction of cluster ordinary network maps and switchboards which are used during organization of network of the work stations are needed. However in case of the cluster computer system there is one fundamental feature. The network of cluster, first of all, is intended not for connection of computers, and for connection of calculable processes. In this connection, than higher there will be a carrying capacity of computer network of cluster, the user parallel tasks, executable on a cluster, will be considered quicker. Thus, technical descriptions of computer network acquire a primary value for the multiprocessor cluster systems.
Presently the problem of choice and analysis of network technologies for the module multiprocessor cluster systems did not get due development, as well as problem of reorganization of structure ofnetwork interfaceby aggregating of channels of network interface. In addition, works, sanctified to research of influence of network technologies on efficiency of parallelization in the module multiprocessor cluster systems, are practically absent. In this connection, the researches examined in hired are actual and, primary.

MATERIALS AND METHODS
The module of the high-efficiency multiprocessor system consists of master-node (PM001) and slave-nodes (РN001, РN002, РN003, ..., PN00N), calculable operations oriented to implementation; two guided switchboards (Switch IB1, Switch IB2) in the networks of exchange by data between calculable knots; hybrid sluice (SkyWay IB3), oriented to the management by the system, her loading and diagnostics; virtual local networks, intermediate buffers of memory of the guided switchboards; networks of exchange by data in the system of slave-nodes; mechanism of backuping of basic components of the system. Loading of knots of the system is produced through corresponding switched local networks. On a Fig. 1 her flow-chart is presented.
The structural feature of the system consists of the following. In the interconnect network of exchange by data between slave -by the knots of the multiprocessor system for application of technology of InfiniВand [12] copper cables are used. Function of network adapters to execute network maps which support work of the module of the multiprocessor system in the standard of InfiniВand.
For constructing of the multiprocessor system the processor of Intel Core I5 (this processor 10 generations, using type of memory of DDR4 with support of RDMA, allowing the association of ports of main memory to 4 channels with frequency to 4 GHz and by volume of by 128 Gigabyte of RAM) was used.
The computer system foresees vertical, parallel in relation to each other, arrangement of system boards. Just the same approach and answers the idea of constructing of "blade"-servers. After loading of OC access to the multiprocessor system takes place by the use of standard network protocols (telnet, ssh, rsh). At that rate for organization of parallel calculations with participation of working computer and multiprocessor system it is necessary to set network connection between them. Such connection gets organized by means of topology "point-point".
After the serve of signal of START from a control of master-node (PM001) panel his electric feed is produced by means of the corresponding power ( Fig. 1) module.
The process of start and initialising of master-node (PM001) of the module multiprocessor system begins further. The system foresees two modes of loading OS, namely: from a hard disk or from certain external media. According to setting unique dynamic IP-adresses and corresponding local networks by default loading of blades of the system is answered by the special configuration script. He is started right after loading of the operating system. In addition, this script will realize tuning of DHCP server on the whole. Exactly on this stage of work of the system the amount of slave-nodes of the computer system (РN001, РN002, РN003, .., РN00N) is appointed, and if necessary tuning of access is executed to the external networks or to the environment of the Internet.
The certain parameters of the system and her tuning are thus determined. The start of slave-nodes of the computer system (PN001, PN002, PN003, .., PN00N) and loading OS is produced by a serve to them corresponding feed from blocks ( fig. 1). After implementation of the transferred operations a configuration script completes work, and the computer system will be ready to realization of the functions.
The multiprocessor system includes six virtual local networks of VLAN: Net1 − Net6. Thus a virtual local network of Net1 is a network of the interchannel aggregating of network interface of the multiprocessor system, she is oriented to realization of the interchannel aggregating of network interface through ports of IB3.Agr1 and IB3.Agr2 of hybrid sluice of SkyWay IB3. A virtual local network of Net2 is a network of external and cloudy interfaces, she is oriented to loading of management data by the system, to her configuration, and also cloudy converting of heterogeneous data with the purpose of their further treatment. A virtual local network of Net3 is a system network which will realize the rapid and energyeffective method of storage and exchange information. A system network is oriented to the rapid loading of the multiprocessor system, her configuration. Through this network the process of tuning of files of configuration of OS will be realized, storage of kernel OS, elements of safety of firewall and other the Virtual local networks of Net1 − Net3 is formed on the base of new hybrid sluice of NVIDIA InfiniBand. Family of module switchboards of NVIDIA InfiniBand is provided the very tall productivity and closeness of port with a deblocking high carrying capacity in a demihull. Such switchboards are perspective in tasks high-performance, cloudy calculations and artificial intelligence at less expenses and complication. They differ in the scaleable hierarchical aggregating, possibility of selfhealing of networks, assured quality of service. The hybrid sluice of NVIDIA Skyway InfiniBand to Ethernet Gateway is used in this development. Basic descriptions of such sluice are presented in a Table 1.  A virtual local network of Net4 is an interchannel network which will realize connection of the aggregated modules. In this connection she executes the functions of aggregating of channels. A virtual local network of Net5 is a network of technology of virtualization of data, she provides the increase of reliability of storage of data, and also serves for the rev-up of reading/notation of information. A virtual local network of Net6 is a network of the layer aggregating of data of network and interface.
The flow diagram of topology of network interface of the multiprocessor system is presented on a Fig. 2 Maximal efficiency of the module of the multiprocessor system is arrived at by reconfiguration of local network structure in accordance with the specific of the decided tasks (Fig. 2). Reconfiguration foresees the use of six modes of operations of network. Topology of local network of type a "star" answers the first mode, second, is a "line", third is a "complete count", fourth -"ring", fifth is a "grate" and finally sixth is the "reserved grate". Thus switching of the modes of topology of local network comes true by tuning of routers at hardwarilyprogrammatic level. To that end the modes of operations of ports of network maps change from full-duplex in halfduplex, and also in pairs to the symmetric aggregating of network ports of switchboards on a reception and/or communication of data. The concrete features of each of such modes of operations of reconfigurable network of the system in detail are reflected in literature [13,14], where and the features of exchange open up by data between slave -nodes.
We will consider some fundamental advantages of process of calculations in the offered system. At first, in slave-nodes the process of acceptance/of communication of data is executed by facilities of the guided switch board IB without application of the mode of spooling. Secondly, intermediate and final calculations enter master-knot through switchboard Infiniband. Directly process of management and communication of the noted data from calculable slave-nodes will be realized by means of network adapters of HCA. Finally, it is necessary to mark the fundamental feature of the use of technology of InfiniBand, in obedience to which all results of capture of data from input units, their treatment and transmission of the managing affecting out devices realized through the processor module with the interface of TCA (Fig. 2). In addition, storage of the indicated data with a management on a local network comes true on the hard disk of SSD with the interface of NVMe. Such approach allows substantially to promote efficiency of calculations.
For configuration of the considered network interaces the basic operations of tuning of the mode of Link Aggregation are executed. Directly procedure of "fastening of channels" or technology of fastening of channels, allowing to unite a few network adapters in an onespeed channel in full lighted up in-process [15]. It is here shown that exchange by data between the calculable knots of cluster comes true in a separate network, working on the basis of the mentioned technology of fastening of channels. Offered approach provides the rev-up of exchange data between the secondary sites of the multiprocessor system and to reduce loading on a channel, connecting the knots of the multiprocessor system.
For planning of the multiprocessor computer system the vehicle providing of corporation NVIDIA was used. In this connection possibility of the use of software and hardware architecture of parallel calculations appears on the basis of platform of CUDA. Such platform allows to use the resources of video cards for ungraphic calculations. And to date this technology becomes more actual.
The platform of CUDA is supported in all video cards of corporation NVIDIA, since the simplest user and to specialized powerful. An acceleration through the platform of CUDA is used by many user applications, for example, MatLab, TensorFlow, Keras of and other In this connection programming, with the use of resources of video card is actual and perspective. Thus get the large productivity for a moderate price.
Basic advantages of platform of CUDA are her free of chargeness (software for all basic platforms is freely got from the resource of developer.nvidia.com), simplicity and flexibility. Technology of NVIDIA CUDA is an environment which allows developers to create software for the decision of intricate calculable problems for less time, due to multinuclear calculable power of graphic processors.
Procedure of programming, applied in CUDA differs from traditional that fully hides a graphic conveyer from a programmer, allowing him to write the programs those in more usual for him "terms". In addition, CUDA gives to the programmer more comfortable model of work with memory. Here is not a necessity to keep data in 128-bit textures, as CUDA allows to read data straight from memory of video card.
For constructing of the multiprocessor computer system the video card of corporation NVIDIA GeForce GTX 1080 was used. Basic descriptions of such video card are presented in a Table 2. On the basis of the use of technology CUDA were realized algorithms of calculation of process of heat transfer [16]. The analysis of time of implementation of parallel algorithms showed that application of technology of CUDA in times is abbreviated by time of processing of these experimental data. The practical testing showed, effectiveness of such calculations exceeded on an order statistics on similar calculations on central processing unit, even with the use of technology of OpenMP. In time is obvious also, that does not make sense to use technology of CUDA for work with the small volumes of data. For the small volumes of datains an acceleration practically is not observed.
On the other hand, it should be noted that the NVIDIA GeForce GTX 1080 video card was selected taking into account the compatibility of the scalable SLI communication interface. When creating the latest technological processes [16], this approach is extremely relevant. The proprietary SLI technology allows you to distribute the calculation between two video cards. Quad SLI extends this technology by allowing two dual-GPU graphics cards to use four GPUs simultaneously. In the above development of a multiprocessor system, two NVIDIA GeForce GTX 1080 video cards were "linked". This approach is aimed not only at a significant increase in computing performance, but also at a significant decrease in latency and a significant unloading of the system bus.
Installation of Quad SLI systems differs in sufficient simplicity. Inserting the indicated two video cards in sockets and connecting SLI-мост, after the start of the system new video cards are installed as an ordinary video adapter. Further after setting of drivers and finding out Quad SLI, рендеринг Quad SLI will be included by default. The new, additional tuning is not foreseen thus. We will notice that the variant of disconnecting of technology of Quad SLI is foreseen in a control of the multiprocessor system panel, that will allow two video cards of NVIDIA GeForce GTX 1080 to work independent of each other. This procedure is extraordinarily important for realization of new installation procedures with the video cards of the indicated type.
We will mark that for "fastening" of two video cards the bridge of SLI is used. Thus, a corporation Nvidia uses a physical socket for connection of video cards together, that allows to co-operate them with each other, not using the stripe of key-in in slots. Thus, one of two bridges of SLI may need: either standard bridge (for less powerful maps) or bridge with a high carrying capacity (for more powerful maps). Applied more powerful map (NVIDIA GeForce GTX 1080), a standard bridge can use, but it will not allow to provide the complete productivity of video cards. Descriptions of bridges of SLI are driven to the Table 3.
At installations SLI of the system the additional cooling of corps of multiprocessor system is provided. For such system a corps is used with a 120 mm by a ventilator, located opposite the sockets of video cards. Placing a 120 mm of ventilator opposite these two video cards allows considerably to reduce the temperature of video cards of GPU. Video cards are assembled so that they выдувают hot air through opening on the backplane of corps of the system. In this connection there is not a necessity to apply additional special cooling, it is necessary only to provide normal ventilation. In respect of energy consumption, then in the similar systems it is necessary to use the high-quality power modules. In the construction of the multiprocessor system the power of type of Corsair HX 1200 module was used watt which without problems provides the feed of Quad SLI systems. The offered multiprocessor system did not force the power module to work to capacity. The applied video cards work also quietly enough, thus even in Quad SLI configurations with noise for them problems are not present.
In addition, it is necessary to underline that graphic processor, possessing powerful calculable capabilities, however would not be able fully to replace activity of central processing unit absolute advantage of which is universality, but in his forces substantially to unload CPU, undertaking loading, presented by the most labour intensive and difficult tasks.

EXPERIMENTS
The worked out multiprocessor system is used in a fluidizer intensification of the spheroidizing annealing of long-length steelwork [16]. Setting of have for an object substantially to shorten duration of technological process of the spheroidizing annealing of metal due to the use of unisothermal self-control, that allows to improve technological properties of rolled metal with providing of high dispersion and homogeneity of structure of standard on all plane of his section. Directly the technological process of heat treatment of metal acquires such advantages, as a high yield, substantial mionectic energy consumption and allows to carry out control of technological parameters in the modes of unisothermal treatment of metal.
The task is achieved due to the fact that the installation for intensifying spheroidizing annealing of a long steel product is equipped with a multiprocessor computer system with specially oriented software installed on it. At the same time, the multiprocessor computing system is connected via an information bidirectional communication interface with a process control unit. The multiprocessor computing system is made as a separate module and allows using special software to set and control the necessary temperature conditions on the entire sectional plane of the sample during heating and holding the metal, as well as control the mode of non-isothermal annealing of steel, while the multiprocessor computing system aims to control the thermal regime processing is constant in the annealing temperature range.
The use of a multiprocessor computer system with its software makes it possible, on the basis of a mathematical model of the process of heating a sample, already under production conditions to control the heating of the sample to the transition to the austenitic region and the temperature of phase recrystallization on the entire plane of the cut of a long steel product, and then, having solved the inverse problem of heat conductivity, to control the necessary non-isothermal exposure mode in the annealing temperature range over the entire sample cut plane. The use of the installation for the implementation of the intensive mode of spheroidizing annealing predetermines the uniform distribution of cementite globules in the ferrite matrix, which provides the necessary mechanical properties of the metal required for further cold deformation.

RESULTS
On results experiments the crooked distributions of temperature of standard are got on the plane of his cut where determined: T h − is a temperature of heating of surface of standard, T k − controlled by facilities of the multiprocessor system temperature of phase transformation of metal. The design of such temperature fields comes true taking into account the change of thermophysical properties of material during his heating. The microstructure of standards after heat treatment purchased values 150−169 НВ.
The performed spheroidization of the carbide phase of the metal under the conditions of the corresponding modes of heat treatment of the workpieces provides the material with the structure of granular perlite. Moreover, high-speed spheroidization predetermines a more uniform distribution of cementite globules in a ferrite matrix. After heat treatment, steel samples of almost the same hardness acquired a finely dispersed structure, which ensures a high level of metal ductility. Due to the rapid heating of the sample and incomplete austenitization of steel, certain changes occur in the morphology of the carbide phase from lamellar to finely dispersed globular.
The use of the proposed multiprocessor system can significantly improve the operational and technical characteristics of the technological process as a whole. This is due to the introduction of the following factors. Thus, in comparison with the well-known approach [17], due to the use of a processor module with a new generation TCA interface and an SSD hard disk with an NVMe interface, it was possible to reduce the operating system boot time on the main node by 180%, on the slave nodes by 320%; network interface software reorganization time decreased by 530%; the time for processing, sending and storing intermediate and final calculation results has decreased by 250%; 240% reduction in processing time for system statistics.
Through the use of VLANs and multidimensional network interface link aggregation, it was possible to increase the throughput of the network interface port from 200 Mb / s to 800 Mb / s, which increases the speed of data exchange between the nodes of a multiprocessor system four times.
Compared to the well-known approach [17], due to the use of the software and hardware architecture of parallel computing by NVIDIA Corporation based on the CUDA platform, it was possible to increase the amount of video memory by 16 GB on each computing node of the multiprocessor system, as well as increase the overall performance of the system node by 350 Gfl.

DISCUSSION
It was found out on the basis of analysis of methods of decision of the modern applied tasks, that application of the parallel computer systems -one of strategic directions of development of informative technique. It is possible explained by the permanent height of amount of the applied tasks, for the decision of which possibilities of present computing facilities failing. Obviously, that by means of the high-efficiency module systems the successful method of decision of actual tasks of wide class was found. For this reason in hired the problem of constructing of the high-efficiency multiprocessor system was lighted up.
This paper considers a complex formalized approach to designing a modular multiprocessor system with multidimensional network interface aggregation. At the same time, an analysis of approaches to the design of multiprocessor systems showed that recently computer equipment manufacturers have been offering devices based on blade technologies (blade technologies). Under such circumstances, by constructing a multiprocessor system based on blade servers, a turnkey solution is obtained, equipped with the necessary management tools and a network interface. We note some of the main advantages of such design solutions compared to others: blade systems are more compact and easy to maintain, their design features make it convenient to form the required configuration.
The analysis of constructing of the multiprocessor systems showed that the new high-quality stage of development of the multiprocessor computer systems was in the field of the use of new modern network technologies. It can be explained by substantial differences between the network of the cluster computer system and network of the work stations. Yes, the network of the computer system is intended not for connection of computers, and for connection of calculable processes. Then, than higher there will be a carrying capacity of computer network of the system, the set an user parallel tasks will be executed quicker. Consequently, technical descriptions of computer network acquire a primary value, when speech goes about the multiprocessor computer systems. Having regard to marked, it was made decision to apply technology of network interface of InfiniBand. Consequently, exchange between the knots of the multiprocessor system organized data by means of standard of InfiniBand. It is shown that by comparison to other multiprocessor systems, the worked out system has such fundamental differences: network loading of processors, maintenances of the mode of VLAN, mechanism of backuping of key constituents of the module, specially worked out mode of exchange by data between the knots of the system in the network of switchboards of InfiniBand.
When designing a multiprocessor system, special attention was paid to traffic between neighboring nodes of a multiprocessor system, which is the slowest part of the algorithm of performed calculations and can significantly reduce the effect of increasing the number of processors involved. Exactly this circumstance and allows to talk that one of basic ways of increase of efficiency of the multiprocessor systems consists in aggregating of channels of network interface in the network of exchange by data between slave-nodes of the system. Thus, it is set that a theme of increase of efficiency of the multiprocessor systems due to reorganization of structure of network interface for today is actual, interesting, and her research is on the stage of active development.
A comparative analysis showed that presently the problem of aggregating of channels in the module multiprocessor cluster systems is not decided properly. Besides, critically small works in which influence of architecture of network of the cluster system would be investigated on efficiency of parallelization of calculations.
At the same time, multidimensional aggregation of channels of the network interface of a multiprocessor system is implemented on the basis of six VLANs. The interchannel aggregation network of the network interface of a multiprocessor system, external and cloud interfaces, as well as the control and diagnostics network are formed on the basis of the new NVIDIA InfiniBand hybrid gateway. On the other hand, the use of modular NVIDIA Infini-Band switches supports a standard set of network technologies, in particular virtual networks, traffic prioritization, aggregated links, and multicast traffic filtering. The family of such switches is promising in high-performance, cloud computing and artificial intelligence tasks at lower cost and complexity. In addition, they are distinguished by scalable hierarchical aggregation, self-healing networks, and guaranteed quality of service.
The technology of fastening of channels of network interface of the multiprocessor cluster system worked out in hired allows to unite the knots of cluster in a network so, if each of them was connected to the switchboard more than by one channel. The described technology is similar to the mode of tracking at connection of switchboards, due to which it is succeeded to rev up communication of data between two or by a few switchboards. Application of procedure of fastening of channels allows to attain the even partition of load (acceptance/of communication of data) between them in the multiprocessor system and to promote speed of exchange data between her knots.
Once again it is important to pay attention to main advantage of the mode aggregating of channels, due to which substantially speed of exchange rises by data, and also reliability of functioning of the cluster system indexes grow. So, in case of refuse adapter a traffic is sent to the next in good condition adapter without breaking of service. When an adapter again begins to work, then sending of data recommences through him.
To implement multidimensional link aggregation of the network interface, the advantage was given to the adapter from Mellanox. At the same time, network adapters of the MHQH29C -XTR type were chosen, which, supporting switching according to the virtual protocol VPI, provide flexibility in connections in computing systems. Under such conditions, a multiprocessor system provides high-quality computing, high-speed access to data storage resources, guaranteed high throughput, and low data transfer latency.
When designing a multiprocessor system, special attention was paid to its practical aspects of functioning. Thus, due to the use of the hybrid NVIDIA Skyway In-finiBand to Ethernet Gateway, a processor module with a new generation TCA interface, as well as the use of the CUDA platform, it was possible to significantly increase the computing power of a multiprocessor system without wasting time on reorganizing the network interface operating modes to solve the required class of applied tasks.
Computational experiments were carried out under the control of cluster operating systems using VLAN technology and a set of MPI libraries in the object-oriented programming environment of the C# language.

CONCLUSIONS
In the article the ways of increase of efficiency of the multiprocessor cluster system are shown due to reorganization of architecture of her network interface. Offered approach allowed not only to promote efficiency of распараллеливания but also substantially to decrease time of calculations. Such results succeeded to be attained due to diminishing to time of border exchange of data between the calculable knots of the cluster system.
The transferred signs of the worked out system allowed her to perfect with acquisition of certain differences from present to the system, namely: -at first, due to realization of technology of Infini-Band were it is attained such advantages: subzero latentness and high fast-acting; -secondly, possibility to change -interface configuration of local networks of the system through a control or WEB stand, adapting their structures to the decision of tasks of that or other type; -thirdly, on the basis of principle of RDMA of technology of InfiniBand a direct exchange comes true by data between main memory of knots of the multiprocessor system, which promotes the fast-acting of calculations and provides high-speed access to memory of her slavenodes systems, and also exchange by data between them, off-loading the system CPU during an exchange by data and reducing loading of channel which passes between the knots of the computer system; -fourthly, application of multichannel hybrid sluice of NVIDIA Skyway InfiniBand in a copula with the processor module of TCA with the interface of NVMe and hard disk of SSD creates fundamentally new possibilities of "connectivity" of such system with other calculable environments; dirigibility of the system allows substantially to promote; in particular, to unload central process-ing unit (due to maintenance of traffic of InfiniBand); to reduce time on switching of the modes of operations of virtual networks, collection, transmission and storage of results of calculations and, as a result, promote efficiency of all multiprocessor system on the whole; -fifthly, due to module principle of construction to simplify the processes of planning, increase or replacement of those calculable knots which broke ranks, and also on the whole exploitation of all constructed system.
The scientific novelty. For the first time, a procedure for multidimensional aggregation of network interface channels of a multiprocessor computing system was proposed, which made it possible to increase its performance and speed.
The practical significance. The results of the experiments carried out make it possible to recommend the developed multiprocessor system for creating new technological processes.