DETECTING OBJECTS MOVING IN SPACE FROM A MOBILE VISION SYSTEM

Context. In the study, the task of identifying objects moving in space from a mobile system of technical vision is considered. The analysis of the modern methods of dynamic object identification from both stationary and moving platforms is conducted. The need to create a new method for the identification of dynamic objects with a mobile optical-electronic system, which is adaptive to changing observation conditions, is identified. This is a relevant scientific and technical problem. The object of the study is the model of moving object detection from a mobile vision system. Objective. The objective of this article is the analysis of the modern methods of moving object identification and the creation of a new method. The method must allow observation from a mobile vision system and must be able to adapt to changing observation conditions. Method. A method for identifying objects moving in space from a mobile vision system is proposed, which allows to automatically detect moving objects, determine their three-dimensional coordinates with a given accuracy, and adapt to changing observation conditions. This method is based on the developed mathematical model of stereoscopic determination of motion parameters of objects in space, which allows us to increase the detection accuracy. Results. The proposed method is implemented in software. An experiment confirming the adequacy of this mathematical model was conducted. As the result of the experiment, data on the movement of the object and the mobile coordinate system were obtained. Conclusions. The experiments have confirmed the performance of the proposed method and allow us to recommend it when building mobile automatic tracking and identification systems for objects. The method allows automatic isolation of the moving objects, determining their three-dimensional coordinates, and adapting to changing observation conditions. Prospects for further research may be in the creation of hardware tools for the selection of moving objects, allowing to improve the accuracy of the selection.

NOMENCLATURE E i is contour energy; α, β are constants that provide relative energy correction; E int (v i ) is an energy function depending on the shape of the contour; E ext (v i ) is the energy function depending on the properties of the image and the type of gradient in the neighborhood of the point vi; α d , α u , α v are threshold constants; u p , v p are velocity components of point p in coordinates (x p , y p ); E is a given non-negative threshold; A is the set non-negative angular threshold; H is the maximum level of brightness in the image; f (p), f (q) are the brightness of the pixels p and q, respectively; P (X, Y, Z) is a point of the object in three-dimensional space; F is the focal length of the lens; B is the distance between the optical axes; Dispar is disparity; Z is an unknown parameter; (ΔX, ΔY, ΔZ) is the direction of movement of the observation system relative to the environment; (k x , k y ) is the direction of the gradient vector; X n is the normal flow vector; INTRODUCTION Due to the intensifying implementation of vision systems in the industry, the developments connected with the visual perception of moving objects are relevant [1]. An important task in this field is the detection of objects moving in space. Another reason why the task of the detection of the moving objects is interesting and significant is the possibility of the wide use of the method in systems of robotic vision technologies [2]. If a vision system is stationary, and the object is moving relative to it and enters its field of vision, the task of object selection is narrowed down to the analysis of a sequence of images and the detection of changes [3]. A more complicated situation is the case of a dynamic vision system, when not only there is a movement of the target object, but also a movement of the observation system relative to the surrounding environment. Therefore, even the static parts of the scene are subject to dynamic changes depending on the movement of the observation system. Dynamic vision systems are of the greatest interest since it is possible to use them in mobile observation systems [4].
The object of the research is the model of moving object detection from a mobile vision system.
The subject of the research is the methods of separation of objects moving in space.
The aim of the work is to analyze the modern methods of identifying objects moving in space and to create a method that allows observing from a mobile vision system that is adaptive to changing observation conditions.

PROBLEM STATEMENT
A stereo image includes two separate types of the imaged object. It is required to determine the coordinates (X, Y, Z) of the point P, given by the projections p (x 1 , y 1 ) and p (x 2 , y 2 ) of its image on the matrix photodetectors of image sensors.
Let us consider in more detail the coordinate system of a stereoscopic vision system and construct its geometrical model. It is possible to establish the relationship between the point P (X, Y, Z) and the coordinates (x, y) of its projection on the matrix photodetector.

REVIEW OF THE LITERATURE
In the case of the analysis of the two-dimensional movement of the object, any of its points can be defined as P(x, y), where x and y are the two-dimensional coordinates of the object. Such objects can be detected by analyzing the changing sequence of images, adjusted for the change of the observation system's position relative to the target object's plane of movement since the changes to the scene depth are negligible in comparison to the distance between the vision system and the target object [5].
Usually, image analysis involves obtaining the outer contour of the depicted objects and recording the coordinates of points of this contour. Most often it is necessary to get the outer contour in the form of a closed curve or a set of segments of arcs [6].
Consider the various methods of contour analysis. Active contours are widely used in the tasks of selecting contours, borders, and image segmentation. To detect the contours in the image, the minimum energy curves, or snakes, are used. The algorithm is as follows: first, the contour is initialized as a simple line, and then it is deformed to create the area of the object. Points in the contour tend to the boundary of the object while minimizing the energy of the contour. For each point v i , the energy where α, β are constants providing relative energy correction; E int (v i ) is an energy function depending on the shape of the contour; E ext (v i ) is the energy function depending on the properties of the image and the type of gradient in the neighborhood of the point v i [7].
The values E i , E int (v i ), E ext (v i ) are square matrices. The value at the center of each matrix corresponds to the energy of the contour at level v i [8].
Each vertex v i potentially can go to any point v i ' corresponding to the minimum energy E i . This method has the following disadvantages: -If the object does not have clear boundaries or the area is heterogeneous and contains smooth gradients, the algorithm will not solve the segmentation problem correctly, making further automated analysis impossible; -The normal of the tangent vector at a point can vary greatly in the direction, which can lead to the merging of points. Because of this, the contour can turn out to be rough and very different from the borders of the selected object.
Unlike the usual active contour model, the active contour model without prior selection of boundaries does not require prior selection of the boundaries of the image object, and it is not necessary to smoothen the original image. The curve moves, starting from an arbitrary point of the image. When crossing the border, it begins to deform and take the form of an object in the image, as if filling its internal part [9].
J. Canny studied the mathematical problem of obtaining a filter that is optimal in terms of the selection, localization, and minimization of several responses of one edge. This means that the detector (known as the Canny edge detector) should react to the borders, but at the same time ignore the false ones, accurately determine the boundary line and react to each border only once, which allows avoiding the perception of wide bands of brightness as a combination of borders [10].
The algorithm includes: -Anti-aliasing -blurring the image to remove noise; -Search for gradients -borders are marked where the gradient of the image gets the maximum value; -Suppression of non-maximums -only local maxima are marked as borders; -Double threshold filtering -potential boundaries are determined by thresholds; -Trace ambiguity -the final boundaries are set by suppressing all edges that are not associated with certain (strong) boundaries.
To reduce the sensitivity of the algorithm to noise, the first derivative of the Gaussians is applied [11]. After applying the filter, the image becomes slightly blurred.
The tracing contours method consists of sequentially drawing the border between the object and the background. A tracking point moves along the image until it reaches the dark area (the object). Then it turns left and moves along the curve until it reaches the borders of the object, and after that, it turns right and repeats the process until it reaches the vicinity of the starting point [12].
With respect to speed and distance, the nearest neighbor clustering is used. Let us denote two lines as {p 1 , ..., p m } ϵ S 1 and {q 1 , ..., q n } ϵ S 2 , provided that they satisfy the following conditions: Closest neighbor clustering is the most efficient method for scenes with interference. Interference is processed during the tracking phase [13].
Border detection methods highlight in the image only the pixels lying on the contour. In practice, this set of pixels rarely displays the contour accurately due to noise, contour breaks due to inhomogeneous illumination, etc. Therefore, contour detection algorithms are usually supplemented with binding procedures to form sets of contour points [14]. One way to associate contour points is to analyze the characteristics of pixels in a small vicinity of each image point, which has been marked as a contour. All points that are similar in accordance with some criteria are connected and form a contour consisting of pixels corresponding to these criteria. It uses two main parameters to establish the similarity of the contour pixels: the response of the gradient operator, which determines the value of the contour pixel, and the direction of the gradient vector. A contour pixel (x 0 , y 0 ) located inside a given vicinity of a point (x, y) is considered to be similar to a pixel (x, y) modulo the gradient if where E is a given non-negative threshold, and in the direction of the gradient, if where α (x, y) = arctg (∂x \ ∂y); A is a given non-negative angular threshold. A pixel in a given vicinity is combined with a central pixel (x, y) if the similarity criteria are met both in value and in direction. This process is repeated at each point of the image while simultaneously memorizing the found associated pixels when the center of the vicinity moves.
A simple way to account for the data is to assign its own brightness value to each set of bound pixels of the contour [15].
Finally, the approach to detecting and linking contours based on the representation in the form of a graph and finding the least-cost paths on this graph, which correspond to significant contours, allows us to construct a method that works well in the presence of noise. Such a procedure is rather complicated and requires a lot of processing time [16].
The outline element is the border between two pixels p and q, which are neighbors. Contour elements are identified by the coordinates of the points p and q. A contour is a sequence of interconnected elements.
Each contour element defined by pixels p and q corresponds to a certain value where H is the maximum level of brightness in the image; f (p), f (q) are the brightness of the pixels p and q, respectively.
The task of finding the minimum cost path on a graph is nontrivial in computational complexity, making it necessary to sacrifice optimality in favor of the computational speed.
The complexity of implementation and high resource intensity are the main disadvantages of such an analysis. Its main advantage is low sensitivity to noise [17].

MATERIALS AND METHODS
When selecting objects moving in three-dimensional space, a point of the objects is defined as P(X, Y, Z), i.e. it is necessary to define an additional variable characterizing the depth of the points of the scene [18]. To solve this problem, we use a monocular vision system, the coordinate system of which is shown in Fig. 1. Supposing that the optical axis of the camera coincides with the Z-axis, the coordinates of point P have the following form: where F is the focal length of the lens, and x, y are the coordinates of the projection of the image element onto the plane (Fig. 1).

Figure 1 -Coordinate system of the monocular vision system
It can be observed from Equation (1) that to determine the coordinates of the point P of the object it is necessary to determine the unknown parameter Z, meaning that in order to solve this problem we need to use a stereoscopic vision system (Fig. 2).

Figure 2 -Coordinate system of the stereoscopic vision system
Knowing the focal length f and the distance between the optic axis B, we can find Z: where Dispar stands for the disparity [19]. The equations above are correct only for static vision systems. Let us consider a dynamic vision system, which allows us to detect objects moving in a three-dimensional space. Supposing that the observation system is moving relative to the surrounding environment with the movement direction (ΔX, ΔY, ΔZ), the equation connecting the three-dimensional coordinates of the point P(X,Y,Z) of the dynamic object with the coordinates of the projection p(x,y) will take the following form: In order to detect a dynamic object, we designate the direction of the gradient vector as (k x , k y ), then the normal flux vector will take the following form: Combining (5) with (3), (4), we obtain: X n = -k x FΔX/Zk y FΔY/Z + (xk x + yk y )ΔZ/Z + (xyk x /F + +(y 2 /F +F)k y ) -((x 2 /F +F)k x + xyk y /F) + (yk x + xk y ), and from this we get: Consider the process of detecting a moving object from a moving base. Let the coordinate system 0xyz be associated with the object space. We will associate the coordinate system 0x 0 y 0 z 0 with the moving base of the optical instrument. Moreover, the order of rotation of the moving coordinate system is as follows: angle ψ in the plane 0xy, angle υ in the plane 0y 0 z, angle γ in the plane 0x 0 y 0 . If the angular velocities of turns are designated accordingly ψ', υ', γ', then the projections of the angular velocities of the base on the axis of the moving coordinate system can be written in the form: ω x0 = υ'cos γ -ψ'cos υ sin γ; ω y0 = γ'+ ψ' sin υ; ω z0 = υ'sin γ -ψ'cos υ cos γ.
Angular velocities ω x0 , ω y0 , ω z0 can be measured using gyroscopes oriented along the axes of the mobile coordinate system and fixed on the base [20].
The orientation of the vision system is defined by two angles: σ and φ (Fig. 3).
At these angles, the coordinate system O'x p y p z p associated with the vision system is deployed relative to the base. We choose this coordinate system so that the axis O'x p coincides with the main optical axis of the device, and the axes O'y p and O'z p are oriented along and across the frame. The vector of the linear velocity of the center of gravity of the base can be represented as its projection on the base axis v x0 , v y0 , v z0 [21].
Consider the equation of motion of the system at the initial moment of time, when the axes of the base coincide with the axes of the fixed coordinate system, i.e. ψ = υ = γ = 0. Let us place two additional coordinate systems that are parallelly transferred from point O to the field of images O i x i y i z i and to the field of objects O p x p y p z p (Fig. 3).
Then, considering the distance between the origins of the coordinate systems, we can write down the equations of the coordinates in two systems: Differentiating the last two equations with respect to time and performing transformations with respect to variables y i ', z i ', we obtain: Rearranging equations (8), we get: y p = -Hy i /(f(1 + (1/f)z i ctg φ));

EXPERIMENTS
To confirm the adequacy of this mathematical model, the mathematical method proposed by us was implemented in software. A program was developed in which the equations of motion of the moving reference system (that is, the trolley on which the cameras were mounted, which tracked the movement of the object) and the object of observation were set.
With the help of software, an experiment was performed. In the course of the experiment, the motion of the object and the moving reference system was modeled. The data on the change of coordinates of the object and the moving report system in space were obtained. The frequency of measuring the coordinates was 30 times per second, the total time of the experiment was 3 minutes.

RESULTS
The data obtained during the experiment are shown in Tables 1-4.

DISCUSSION
As a result of the experiment, data were obtained on the motion of the object and the moving coordinate system. In particular, we have found: -Coordinates x, y, z of the moving reference system relative to the static coordinate system; -Coordinates x, y, z of the moving object relative to the moving reference system; -Coordinates x, y, z of the moving object in the static coordinate system.
As you can see, the coordinates of a moving object, calculated using the proposed mathematical model, completely coincide with the coordinates obtained experimentally.
Based on these results, it can be concluded that the proposed mathematical model adequately describes the change in the coordinates of the moving object and the moving reference system.
Most vision systems in use today have a number of issues that limit the possibilities for their practical use. In particular, some methods, such as the active contours model and border detection methods have limited accuracy, especially in conditions where interference is present. Other systems are impossible to implement on a mobile platform, which limits their ability to detect moving objects. The new method allows us to solve these problems and to get accurate coordinates of a target object.
The method proposed by us can be used in practice for constructing mobile automatic tracking systems and the identification of objects.

CONCLUSIONS
The offered method allows us to determine the coordinates of dynamic objects from mobile bases of vision systems.
The scientific novelty of the results obtained is that a method has been proposed for isolating objects moving in space from a mobile vision system, which allows to automatically isolate moving objects, determine their three-dimensional coordinates with a given accuracy, and adapt to changing observation conditions.
The practical significance of the results obtained is that experiments have been conducted to confirm the adequacy of the proposed mathematical model. The results of the experiment allow us to recommend the proposed method for constructing mobile automatic tracking systems and the identification of objects.
Prospects for further research are in exploring the possibility of implementing this method on a software and hardware system that allows you to improve the accuracy of the selection of objects.