EFFECTIVE ALGORITHM FOR PARSING SENTENCES USING SEMANTICALLY ATTRIBUTED WEIGHTED AFFIX CONTEXT FREE
Context. The problem of increasing efficiency of affix grammars over a finite lattice (AGFL) is considered. AGFL is a context-free grammar with flexible and compact form of productions for parsing texts in natural languages.
Objective. The goal of the work is to increase efficiency of parsing sentences by means of AGFL with a modification that adds semantical attributes to the productions and introduces a new form of production called the “template production”. This modification helps to decrease the number of productions that are required to describe a language and lets reduce the computational complexity of the parsing algorithm.
Method. A mathematical model of the template production is developed and the theorem is proved that claims that the normal form of the template production exists and the normalization procedure produces an equivalent grammar. The normal form is utilized to increase efficiency of parsing Ukrainian sentences. The template production helps to represent ontology-based rules in a short and computationally inexpensive way. The normal form of template production is studied, and an effective algorithm for parsing sentences is proposed. The worst-case complexity of the proposed algorithm is 0(n3 m3p mr), where n is the length of input string of terminals, mp is the maximum number of combinations of symbol and attributes that can produce the same string of terminals, and mr is the maximum number of productions that have the same starting non-terminal symbol in the right part. The growth of parsing time turned out to be almost linear function of the number of words in a sentence when parsing of sentences from the test database of Ukrainian fiction literature.
Results. The developed method has been implemented in the UkrParser software that is available open-source on GitHub.Conclusions. The developed algorithm was tested on the database of Ukrainian sentences and demonstrated ten times faster parsing speed than Stanford parser. The future research can be focused on the development of grammatically attributed ontologies for wider set of topics that should improve results of semantical sentence parsing.
Najmi E., Hashmi K., Malik Z., Rezgui A., Khanz H. U. ConceptOnto: An upper ontology based on ConceptNet, Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), November 10–13, Doha. Qatar, 2014, pp. 366–372. DOI: 10.1109/AICCSA.2014.7073222.
Eddy S. R., Durbin R. RNA sequence analysis using covariance models, Nucleic Acids Research, 1994, Vol. 22, No. 11, pp. 2079–2088.
Koster C.H.A. Affix Grammars for natural languages / C.H.A. Koster, In: Attribute Grammars, Applications and Systems, International Summer School SAGA, Lecture Notes in Computer Science, Prague, Czechoslovakia, 1991, Vol. 545, pp. 469–484.
Smith N. A., Johnson M. Weighted and Probabilistic Context-Free Grammars Are Equally Expressive, Computational Linguistics, 2007, Vol. 33, No. 4, pp. 477–491. DOI:10.1162/coli.2007.
Chomsky N. Three models for the description of language, IRE Transactions on Information Theory, No. 2 (3), 1956, pp. 113–124. DOI:10.1109/TIT.1956.1056813.
Oostdijk N. An Extended Affix Grammar for the English Noun Phrase, In: Jan Aarts and Wim Meijs (eds), Corpus Linguistics. Recent Developments in the Use of Computer Corpora in English Language Research, Amsterdam, Rodopi, 1984.
Smith T. C., Cleary J. G. Probabilistic Unification Grammars, In Australasian Natural Language Processing Summer Workshop, 1997, pp. 25–32.
Lozynska O., Davydov M. Information technology for Ukrainian Sign Language translation based on ontologies, ECONTECHMOD: an international quarterly journal on economics of technology and modelling processes, 2015, Vol. 04, No. 2, pp. 13–18.
GOST Style Citations
1. ConceptOnto: An upper ontology based on ConceptNet / [E. Najmi, K. Hashmi, Z. Malik et al.] // Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), November 10–13, Doha, Qatar. – 2014. – P. 366–372. DOI: 10.1109/AICCSA.2014.7073222.
2. Eddy S.R. RNA sequence analysis using covariance models / S. R. Eddy, R Durbin // Nucleic Acids Research. – 1994. – Vol. 22. – № 11. – P. 2079–2088.
3. Koster C.H.A. Affix Grammars for natural languages / C.H.A. Koster // In: Attribute Grammars, Applications and Systems, International Summer School SAGA, Lecture Notes in Computer Science, Prague, Czechoslovakia. – 1991. – Vol. 545. – P. 469–484.
4. Smith N. A. Weighted and Probabilistic Context-Free Grammars Are Equally Expressive / N. A. Smith, M. Johnson // Computational Linguistics. – 2007. – Vol. 33, № 4. – P. 477–491. DOI:10.1162/coli.2007.
5. Chomsky N. Three models for the description of language / N. Chomsky // IRE Transactions on Information Theory – № 2 (3). – 1956. – P. 113–124. DOI:10.1109/TIT.1956.1056813.
6. Oostdijk N. An Extended Affix Grammar for the English Noun Phrase / N. Oostdijk // In: Jan Aarts and Wim Meijs (eds), Corpus Linguistics. Recent Developments in the Use of Computer Corpora in English Language Research, Amsterdam: Rodopi. – 1984.
7. Smith T.C. Probabilistic Unification Grammars / T. C. Smith, J. G. Cleary // In Australasian Natural Language Processing Summer Workshop. – 1997. – P. 25 32.8. Lozynska O. Information technology for Ukrainian Sign Language translation based on ontologies / O. Lozynska, M. Davydov // ECONTECHMOD: an international quarterly journal on economics of technology and modelling processes. – 2015. – Vol. 04, No. 2. – P. 13–18.
Copyright (c) 2017 M. V. Davydov, O. V. Lozynska, V. V. Pasichnyk
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
National University "Zaporizhzhia Polytechnic",
Zhukovskogo street, 64, Zaporizhzhia, 69063, Ukraine.
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
The reference to the journal is obligatory in the cases of complete or partial use of its materials.