METHOD OF DATA EXPRESSION FROM THE UKRAINIAN CONTENT BASED ON THE ONTOLOGICAL APPROACH

Context. Nowadays there is a constantly increasing interest to the application of the intelligent systems (IS) in different areas such as information technologies (IT), engineering, medicine, biology, ecology, geography, jurisprudence etc. At the heart of architecture of modern IS’s knowledge bases (KB) are used, which are formed due to the subject area (SA), where the given IS is used. The main part of KB is ontology as clearly structured SA’s model, systematic set of terms, which explain the connections between objects of this SA. Ontologies are generally accepted and widely used in different branches of science such as knowledge engineering, presentation of knowledge, information search, knowledge management, database design, information modeling and object-oriented analysis. In particular, Gather company in their researches of IT-market attributed the use of taxonomy/ontology in his area. Consequently, research of syntactic ontological structures of KB, construction and research of optimal algorithm for syntactic analysis of Ukrainian language texts and the development of software-algorithmic means of content, automatic referencing of texts, gathering knowledge, translation etc. are relevant. Objective. The goal of the work develop a software system for formalizing the rules of syntax of the Ukrainian language in the form of an ontological basis of knowledge for the purpose of its use for working out natural language texts in the Ukrainian language. Method. Methods of solving the problem of creating a consolidated resource based on ontological KB were chosen decision trees, IDEF5 methodology and ontology construction methodology. The results of syntactic analysis work are taken into account by associative-semantic context analysis to optimize the process of constructing associative context relationships between words and sentence combinations within the hierarchical network of ontological BB. Results. A consolidated information resource is created – an ontological KB of parsing analysis of Ukrainian-language text documents with the help of Protégé 3.4.7. Conclusions. The method of data extraction based on ontological BZ and FPGA language is developed for the further development of a consolidated information resource for the syntactic elaboration of text documents. As a result, an ontological type of KB with FPSM was created. The syntactic structure of the input sentence is the foundation and frame for the next, not less important step – semantic analysis. This ontological KB of the consolidated LR of syntactic elaboration of Ukrainian-language text documents serves as a powerful basis for further development of an automated IS for parsing Ukrainian-language texts.


ABBREVIATIONS
KB is a knowledge base; AI is an artificial intelligence; IS is an information system; IT is an information technology; SA is a subject area; PIR is a processing of information resources; LR is a linguistic resource; RSUL is a rules of the syntax of the Ukrainian language.

NOMENCLATURE
O is an ontology; X is a finite set of concepts for the subject area for describing the Ukrainian language; R is a finite set of relations between concepts; F is a set of interpretation functions; Morphology is a finite set of concepts of the morphology of the Ukrainian language; Punctuation is a finite set of concepts of the punctuation of the Ukrainian language; Structure is a finite set of concepts of the structure of the Ukrainian language; Syntax is a finite set of concepts of the syntax of the Ukrainian language; Semantic is a finite set of concepts of the semantics of the Ukrainian language; WordsCombination is a finite set of concepts of the formation of phrases; Sentence is a finite set of concepts of the creation of sentences in the Ukrainian language; SignWords is a finite set of signs of the formation of phrases; LexicalSign is a finite set of lexical signs of the formation of phrases; SyntacticSign is a finite set of syntactic signs of the formation of phrases; Noun is a finite set of registered signs of the formation of phrases; Adjective is a plurality of adjective signs of the formation of phrases; Numeral is a finite set of numerical signs of the formation of phrases; Pronoun is a plurality of pronoun signs of the formation of phrases; Verb is a finite set of verb signs of the formation of phrases; Adverb is a finite set of adverbial signs of the formation of phrases; Coordinated is a finite set of coordinated signs of the formation of phrases; Inferior is a finite set of subordinated signs of the formation of phrases; SimpleWord is a finite set of simple signs of the formation of phrases; ComplicatedWord is a finite set of complex signs of the formation of phrases; AdversativeComt is a finite set of signs of dividing signs; ConnectiveComt is a finite set of connecting signs; DividingComt is a finite set of signs of opposing communication; ContactComt is a finite set of signs of agreement; ManagementComt is a finite set of signs of management; AdjoiningComt is a finite set of signs of adjoining; SignSent is the finite set of signs of the creation of sentences in the Ukrainian language; SentenceMembers is a finite set of signs of the identification of sentence members; NarrativeSent is a finite set of signs of the formation of narrative sentences; PronouncedSent is a finite set of signs of the formation of questionnaire; IncentiveSent is a finite set of signs of the formation of inductive sentences; EmotionallyNeutral is a finite set of signs of the formation of emotionally neutral sentences; EmotionallyColored is a finite set of signs of the formation of emotionally colored sentences; SimpleSent is a finite set of concepts of the formation of simple sentences; ComplicatedSent is a finite set of concepts of the formation of complex sentences; MainSentMemb is a finite set of signs of identification of the main members of the sentence; SecondSentMemb is the set of signs of identification of secondary members of the sentence; AffirmativeSent is a finite set of signs of the formation of affirmative sentences; NegativeSent is a finite set of signs of the formation of negative sentences; SgSpSt is the finite set of signs of the formation of simple sentences.

INTRODUCTION
In the ontology study, questions arise from the first steps.Until now, there is no single definition for the concept of ontology.The concept of ontology comes from the Greek."Ontos" -the existence, "logos" -the doctrine, the concept, this is a section of philosophy that studies existence.In computer science, this is an attempt of comprehensive and detailed formalization of a certain area of knowledge through the conceptual scheme [1].Under the conceptual scheme should be understood a set of concepts + information about the concept (properties, relationships, constraints, axioms and assertions about the concepts necessary to describe the processes of solving problems in the selected software) [2].Among the specialists in computer linguistics, the most established (classical) is the definition of ontology given by Gruber T. : "Ontology is a specification of conceptualization" [3][4].In addition to problems with the exact definition of the concept of "ontology", there is a number of problems with the description of the model of ontology in the formal language [1].However, not all existing ontological LR fall under the given definition.Today, the evolution of applied IS goes toward increasing their intellectualism.This significantly affects the direction of scientific and technological research related to the use of computers, and also gives the society practically important results.However, at the certain stage of development, further improvement of IT by nowadays available resources becomes impossible.In such periods, a qualitative leap is required for the development tools.One such leap in the field of AI, aimed at further intellectualization of the interactions between systems and users, was the emergence of ontologies.
The purpose of the work is to design and develop the system of formalization of RSUL in the form of an ontological KB with the aim of its use for processing the Ukrainian-language content of Web-resources and extraction of data from it.
The object of the research is the processes of extraction from Ukrainian language content of Web-resources based on the ontological approach taking into account the syntax and semantics of texts.
Subject of research is methods and means of technology for processing information resources and extracting data from them based on the ontological approach.

PROBLEM STATEMENT
To develop a software system S of the formalization of RSUL in the form of an ontological knowledge base for its use for the processing of natural language texts written in the Ukrainian language (for example, for automated referencing, extracting knowledge from texts, translating texts into other languages, etc.).At the input of the system there are verbal rules for the syntax of the Ukrainian language, which are given in textbooks and other books about Ukrainian grammar rules.
At the output of the system there is an ontological model of the rules of the syntax of the Ukrainian language O = <X, R, F>.The taxonomy of the ontology concepts X defines the syntax of the language (the root concept of ontology).The optimal determination of the plurality of relations between these concepts R and the set of rules F of the syntax of the Ukrainian language, formalized with the help of descriptive logic DL, will allow to effectively process the nature-language texts in the Ukrainian language, that is: S: RSUL →О.

LITERATURE REVIEW
The authors of works [5][6][7][8][9][10][11][12][13] believe that in designing of ontologies conditionally distinguish two directions, which for some time developed separately.The first (formal) -based on logic (predicates of the first order, descriptive, modal, etc.) [14][15].The second (linguistic)based on the study of natural language (in particular, semantics) and the construction of ontologies on large text arrays, the so-called buildings [16][17][18][19][20][21][22][23][24][25][26][27].Formal ontology is a set of concepts and assertions about these concepts, on the basis of which the classes, objects, relations, functions and theories are constructed [28][29][30][31].Most models of ontologies contain the following components: concepts (concepts, classes); properties of concepts (attributes, roles); relationship between concepts (dependence, function); additional constraints that are defined by axioms [32][33][34][35][36][37][38][39].The role of the concept is a description of the task, function, action, strategy, process of reasoning, etc.The main difference of the ontological system from the usual vocabulary is internal unity, logical interconnection and consistency of the concepts used.The second kind of ontologies is hierarchical lexical resources such as WordNet.They describe the lexical relations between the meanings of words given in the form of individual units in the hierarchical network -sinsets.The relationship between lexical units reflects the relation of objects of the outside world, therefore, such resources are often regarded as a special kind of ontology -lexical or linguistic ontologies.The main characteristic of linguistic ontologies is that they are tied to the meanings of verbal expressions (words, names groups, etc.).Linguistic ontologies cover most of the words of the language and at the same time have an ontological structure that manifests itself in the relation between concepts.Therefore, linguistic ontologies are considered as a special type of lexical database and a special type of ontology.The main difference between linguistic and formal ontologies is the degree of formalization.It is assumed that the development of such resources builds a hierarchy of lexical meanings of the natural language, and for a more rigorous description of the knowledge about world, they compare such resources with any formal ontologies.Thus, the content of one of the projects is the establishment of the relationship between WordNet and EuroWordNet, on the one hand, and the formal ontology SUMO -Standardized Upper Merged Ontology -on the other.The project is to establish a match between the WordNet Sinsets and the concepts of ontology, in which each WordNet sinset is directly related to the concept of ontology, or is a hyponomy for some concept or instance (element) of the ontology concept.Participants in another OntoWordNet project consider that it is not enough to hold formal glue of a resource such as WordNet and formal ontology: a significant restructuring of the source lexical resource is required.
3 MATERIALS AND METHODS The primary task of creating an ontological KB of syntactic analysis is to create diagrams of the syntax classes of the Ukrainian language, which are transformed into taxonomy of the concepts of X.Such diagrams are shown in Fig. 1-6.
In the production of data on the basis of ontological KB and the development of rules of the syntax of the Ukrainian language for the further development of a consolidated information resource of syntactic processing of text documents, it is necessary to focus on the concept of , in the first place.We will submit a finite set of concepts of the formation of phrases in the Ukrainian language as a cortage: , (3) Accordingly, the finite set of concepts of the formation of sentences in the Ukrainian language (Fig. 2) will be presented as a tuple , , , , where the signs of the formation of sentences in the Ukrainian language are divided into several main groups as To determine the simple sentence, it is necessary to analyze the sentences by using the eighth signs (Fig. 3), as described by the following tuples (6).Similarly, classes have been constructed to determine the members of a solution and a complex solution (Fig. 4-6).
. , , , , , , , One of the promising directions for further refinement of IS for PIR is the development of methodological, ontological and logical foundations of the design of KB intended for the analysis of text documents.Ontological aspects include a range of issues, ranging from the scope of application and to the formal description of the components of computer ontology SA.The main vector of research is aimed at formalizing the stages of construction, structuring and presentation of SA's material during the analysis and integrated with the information resource of the problem space, which allows for an effective combination of processed text materials.In turn, the effective implementation of these stages and obtaining the final result (in the form of a library ontological KB with SA) is impossible without conducting a system-ontological analysis of a given set of information LR.The information model of the SA, which is the basis of the functioning of this system, in order to record its state in time, must contain a time component.From this we can conclude that our information model has the following characteristics: the decomposition of the essence, depending on the time parameters; fixing the status of an object (registering changes in the values of a subset of the object's attributes, as changes in the status of an object); object archiving (extracting an object from the current state of the information model).The main contextual diagram reflects the external connections of the highest-level IS (Fig. 7a).The object is the real essence of the software, which changes the state over time.In the developed SA there are such classes of objects: Text and Worked Out Sentence.The interaction of two data classes of objects, that is, the structure of the IS, is given in the context diagram of Fig. 7 b.If we consider the information model at a more detailed level, then it should be noted that each of the object classes contains a number of specific external entities, each of which describes its attributes and specifies the relationship between the classes of objects (Fig. 8).Essences that make up IS and are real objects of a specific SA: IS, text, rule.We give the properties of the essence of the IP (all attributes of objects are static values).1. Information system is a description of IP, which processes the Ukrainian-language text.
2. Text -displays information about the text, that is being analyzed.
3. The rule is the rules of the Ukrainian language, which are followed by the analysis of the text.
In the center of most ontologies there are classes.Protégé and other frame systems describe ontology in a declarative manner, clearly defining the class hierarchy and the affiliation of individual concepts to the corresponding classes.Ontology in OWL has similar components to frame-based ontologies.OWL terminology is based on the concepts of individual concepts or objects and properties that are generally consistent with Protégé, respectively, instances of classes and slots.Objects are separate instances of the subject field.An important difference between Protégé and OWL is that OWL does not use Unique Name Assumption (UNA).This means that two different names can, in fact, be sent to the same object.For example, the names "Queen Elizabeth", "Queen" and "Elizabeth Windsor" mean the same object.In OWL it should be clearly defined that objects are the same or different from each other, otherwise the names may belong to the same or different objects.Properties are binary links between objects.For example, the property "to have a color" ties the object "gold" with the object "yellow", or the property "used in" connects the object "gold" with the object "electrical engineering" (Fig. 9 a).In Protégé, properties are represented by slots, in descriptive logic -roles, in UML and other object-oriented views -by links.Properties can be inverse.For example, inversion to the property of the object "has a color" -"to be color".Properties can be functional (limited to a single value), transitive or symmetric.Classes in OWL are considered as sets containing objects that are described formally (mathematically) for the exact representation of their membership in a particular class.Classes organize a class-subclass into a hierarchy-taxonomy.The OWL subclass means the need to add.For example, "cast iron" and "steel" objects (Fig. 9b) belong to the "iron-carbon alloys" class, which together with "metal-ceramic alloys" and "non-ferrous alloys" is a subclass of "alloys".The OWL subclass means the need to add.For example, "cast iron" and "steel" objects (Fig. 9b) belong to the "iron-carbon alloys" class, which together with "metal-ceramic alloys" and "non-ferrous alloys" is a subclass of "alloys".In the case of building a deeper hierarchy, the objects "iron" and "steel" are regarded as separate classes with their subclasses and objects.In OWL classes, they create descriptions that specify the conditions for the matching of the object to become part of the class instances.

EXPERIMENTS
The main stage of the realization of the task is to create an ontological KB based on the rules of the syntax of the Ukrainian language for the further development of a consolidated LR syntactic elaboration of Ukrainianlanguage text documents.For this purpose, the software program Protégé 3.4.7 was used to create a hierarchy of classes and subclasses of the hierarchy of syntactic concepts based on the rules of the syntax of the Ukrainian language (Fig. 10a).Information about the selected class is displayed on the right side of the window.The upper part of this window allows users to add comments, labels and other annotations.The bottom part displays the logical characteristics of the selected class, which are specified using certain buttons on the panel when you click on the "Create new expression" icon (Figure 10 b).
An example of creating a logical class characteristic is the expression: "A phrase with a junction bond has some united connection" (Fig. 11 a-e).The next step in building a KB is to enter the class representatives in the Individuals tab (Fig. 11e), for example, representatives of the class of lexical unbound phrases.Next, the relationship between certain classes and subclasses is created in the Properties tab in the main panel (Fig. 11f).The created KB has the following relationships: compoundOf (consists of); hasConjunctive (has a connector); hasMember-sOfTheSentense (has sentence members); hasPunctuation (has a punctuation mark).The SWRL Rules tab creates rules for parsing using Semantic Web Rules of Language (SWRL) and the use of a handy expression editor (Fig. 12).When writing rules, classes (subclasses), relationships, and representatives that interact with specific operands in the expression editor panel are used.
One of the benefits of Protégé 3.4.7 is the ability to create queries through the Open SPARQL Query panel.At the bottom of the main window, there will be two more windows: Query -for the query itself, Results -to output the result of the query (after pressing the button Execute query) (Figure 13).In fig.14a  The proposed and developed procedure for extracting data from the Ukrainian-language test on the basis of parsing analysis makes it possible to supplement the conceptual graphs of text documents comparable to the context of the SA determined by the ontology.Recognizing the content of a text document in the first stage is to "recognize" the concepts and statements of this document by defining the degree of similarity of these concepts to their likely counterparts in the ontology of the IRS, taking into account the results of the parsing analysis.A set of recognized concepts is complemented by ontology with all the concepts associated with elements of such a set by generalizing links "IS-A" on one level up, as well as by other semantic ties whose weight exceeds a given threshold value.This add-on provides the recognized text with the conceptual context of the given SA.The relationships between concepts in the text under investigation, in turn, are recognized and used to eliminate the ambiguity of the recognition of concepts if terms with a similar name are present in an ontology in a different context and, accordingly, in different meanings.Recognized in this way, the text, supplemented by a semantically related conceptual structure from ontology, forms a coherent graph of the semantic image of this text.After that, comparing the similarity of the texts will be to calculate the semantic distance between the documents (Fig. 14). Figure 14 -The scheme of semantic comparison of documents The procedure for comparing the text and its ranking according to similarity is included in the general algorithm of the search system of text documents by the model and consists of constructing: 1) weighted graph G of the text document.
2) supplemented by the ontology of the weighted graph of the model document, applying to each vertex of the graph G the procedure for finding his father in accordance with the connections between the concepts. 3 taking into count the results of the parsing analysis.
4) Reducing the redundant elements of the graph.5) other graphs of documents for their ranking, applying paragraphs 1-4.
6) the calculations of the three centers of the importance of the graphs and the semantic distance between the graphs G and G′ .
The effectiveness of the developed method on an example analysis of annotations is shown by comparing this method with the method of the vector-spatial model and the Dies coefficient [2].An experiment with annotations of publications in the field of creativity T. G. Shevchenko (Fig. 15), which showed that the approach proposed on this basis on the basis of adaptive ontology increases the accuracy of the search of documents by an average of 20%.For this purpose from the keywords of an annotation-model the request is generated on the Internet.As a result, 25 annotations from sites corresponding to publications were received.In three methods: the method of conceptual graphs (Montes-Gomes), the Dies coefficient (variant of the vector-spatial model) and the method of adaptive ontologies, a comparative analysis of the annota-tions obtained with the model was made (Table 1).The estimation of the effectiveness of these methods for information search is made by the parameter -the accuracy of the search: The method is based on the Dys coefficient 10/15=0,66, (66 %) The method is based on the vector-spatial model 9/12=0,75, (75 %) Method of adaptive ontologies (developed) 11/12=0,916, (92 %)

DISCUSSION
The need for ontologies is related to the inability of adequate automatic processing of natural language texts by existing means.Creating thesauri does not solve the problem, since different user groups and communities use special terminology that is used by others in the second context to process and analyze information.Also, different communities often encounter different notations for the same concepts.Therefore, for qualitative elaboration of texts it is necessary to have a detailed description of the SA with a plurality of logical connections, which show the relationship between terms.The use of ontologies allows the submission of natural-language text in a suitable form for automatic processing.In addition, ontologies are used as an intermediary between the user and the IS, which allows formalizing the terms used among all users of the project.
Using this approach, account is taken of the context of documents and the context of the semantics of the terms and phrases they take.This makes it possible to automate the search for the documents most relevant to the prototype request and to reject those that are of minor importance and not in accordance with the SA.
According to the results of the experiment, we note that the method of comparison by Dace in 40% determined the most similar to the model those annotations that had the largest number of common words, in addition, the least consistent with the prototype of the content.The method of the vector-spatial model also did not give a satisfactory result.At the same time, taking into account the prior information about the SA, through weighing the vertices and links of the conceptual graphs of the reference and the annotated research, it was possible to select the most relevant annotation model.
This experiment illustrates the effectiveness of using the approach developed to work to automate the search for documents that are most relevant to the prototype query and can be used in constructing intelligent metasearch systems.

CONCLUSIONS
The article deals with the scientific and practical task of extracting data from Ukrainian-language content based on the ontological approach, taking into account the features of syntax and semantics of this language.
The scientific novelty of the results obtained is that for the first time an ontological KB has been created on the basis of RSUL -for the further development of a consolidated LR syntactic elaboration of Ukrainian-language text documents.As a result of the system analysis of the software for the first time was designed KB, which contains consolidated information on RSUL-taking into account its features.The ontological aspect of designing the KB of analytical purpose is considered, which is one of the important practical applications of the direction of ontological engineering.The proposed approach solves and improves the results of solving the following urgent tasks for the processing of text documents: automated development of analytical and syntactic KB on the basis of lingual-semantic analysis of large volumes of texts using original instrumental means (the source text is used from a variety of sources, for example from tested ones in educational institutions of textbooks in Ukrainian with SA); structuring terms and concepts in information resources from a specific SA; a significant reduction in the complexity of compiling the KB analytically and syntactically.For the first time, a method of data extraction was created based on ontological KB and RSUL for forming a consolidated information resource for syntactic processing of text documents.As a result, an ontological type KB with RSUL was created.It serves as a powerful foundation for further development of an actual ultra-complicated process of an automated system of parsing text analysis in Ukrainian.The practical significance of the results obtained is to develop a programmatic system for formalizing the RSUL by means of Protégé 3.4.7 in the form of an ontological knowledge base for its use for the processing of natural-language texts in the Ukrainian language (for example, for heading out, referencing, extracting knowledge, translation, etc.).The created KB is sufficiently developed and allows you to perform the following important functions: creating a hierarchy of classes and subclasses of SA concepts; the introduction of representatives of classes and subclasses, which extends the possibilities of understanding and using KB; creation of a system of links between classes and subclasses; creation and execution of requests of various character; construction of rules for processing data; application of IT in the development of applications, etc. Perspectives for further research are the development of rules for analyzing the semantics of texts in Ukrainian for the more efficient extraction of knowledge from Ukrainian-language Web-resources.Using the method of evaluating the similarity of text documents on the content, based on the adaptation of its ontology to the user's SA, enables to increase the efficiency of automated search of relevant documents.However, we note that the developed method is not an alternative to well-known methods for searching documents, but their additions.For example, you first need to search for keywords, and then add the result to the contextual search.

Figure 2 -
Figure 2 -A class diagram for representing the hierarchy of the classes "Sentence"

Figure 4 -Figure 6 -Figure 7 -
Figure 4 -A class diagram for "The members of sentence"

Figure 8 -
Figure 8 -Detailing of the process "Analysis"

36 Figure 15 -
Figure 15 -The result of using syntactic analysis in Ukrainian language