EFFECTIVE ALGORITHM FOR PARSING SENTENCES USING SEMANTICALLY ATTRIBUTED WEIGHTED AFFIX CONTEXT FREE
Keywords:Weighted affix context-free grammar, semantic parsing, ontology-driven sentence parsing, template productions.
Context. The problem of increasing efficiency of affix grammars over a finite lattice (AGFL) is considered. AGFL is a context-free grammar with flexible and compact form of productions for parsing texts in natural languages.
Objective. The goal of the work is to increase efficiency of parsing sentences by means of AGFL with a modification that adds semantical attributes to the productions and introduces a new form of production called the “template production”. This modification helps to decrease the number of productions that are required to describe a language and lets reduce the computational complexity of the parsing algorithm.
Method. A mathematical model of the template production is developed and the theorem is proved that claims that the normal form of the template production exists and the normalization procedure produces an equivalent grammar. The normal form is utilized to increase efficiency of parsing Ukrainian sentences. The template production helps to represent ontology-based rules in a short and computationally inexpensive way. The normal form of template production is studied, and an effective algorithm for parsing sentences is proposed. The worst-case complexity of the proposed algorithm is 0(n3 m3p mr), where n is the length of input string of terminals, mp is the maximum number of combinations of symbol and attributes that can produce the same string of terminals, and mr is the maximum number of productions that have the same starting non-terminal symbol in the right part. The growth of parsing time turned out to be almost linear function of the number of words in a sentence when parsing of sentences from the test database of Ukrainian fiction literature.
Results. The developed method has been implemented in the UkrParser software that is available open-source on GitHub.Conclusions. The developed algorithm was tested on the database of Ukrainian sentences and demonstrated ten times faster parsing speed than Stanford parser. The future research can be focused on the development of grammatically attributed ontologies for wider set of topics that should improve results of semantical sentence parsing.
Najmi E., Hashmi K., Malik Z., Rezgui A., Khanz H. U. ConceptOnto: An upper ontology based on ConceptNet, Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), November 10–13, Doha. Qatar, 2014, pp. 366–372. DOI: 10.1109/AICCSA.2014.7073222.
Eddy S. R., Durbin R. RNA sequence analysis using covariance models, Nucleic Acids Research, 1994, Vol. 22, No. 11, pp. 2079–2088.
Koster C.H.A. Affix Grammars for natural languages / C.H.A. Koster, In: Attribute Grammars, Applications and Systems, International Summer School SAGA, Lecture Notes in Computer Science, Prague, Czechoslovakia, 1991, Vol. 545, pp. 469–484.
Smith N. A., Johnson M. Weighted and Probabilistic Context-Free Grammars Are Equally Expressive, Computational Linguistics, 2007, Vol. 33, No. 4, pp. 477–491. DOI:10.1162/coli.2007.
Chomsky N. Three models for the description of language, IRE Transactions on Information Theory, No. 2 (3), 1956, pp. 113–124. DOI:10.1109/TIT.1956.1056813.
Oostdijk N. An Extended Affix Grammar for the English Noun Phrase, In: Jan Aarts and Wim Meijs (eds), Corpus Linguistics. Recent Developments in the Use of Computer Corpora in English Language Research, Amsterdam, Rodopi, 1984.
Smith T. C., Cleary J. G. Probabilistic Unification Grammars, In Australasian Natural Language Processing Summer Workshop, 1997, pp. 25–32.
Lozynska O., Davydov M. Information technology for Ukrainian Sign Language translation based on ontologies, ECONTECHMOD: an international quarterly journal on economics of technology and modelling processes, 2015, Vol. 04, No. 2, pp. 13–18.
How to Cite
Copyright (c) 2017 M. V. Davydov, O. V. Lozynska, V. V. Pasichnyk
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.