THE SCIENTIFIC AND TECHNICAL PUBLICATIONS TEXT AUTHORITICATION METHOD BASED ON LINGUSTICAL ANALYSIS OF LANGUAGE DIVERSITY COEFFICIENTS
DOI:
https://doi.org/10.15588/1607-3274-2020-1-12Keywords:
Тext content, NLP, content monitoring, stop words, content analysis, statistical linguistic analysis, quantitative linguistics.Abstract
Context. Authorization of the authorship of the text is a technique for determining the author of the text, when it is ambiguous who wrote it. It is useful when several people claim to be the authors of one publication or in cases where nobody claims to authorship of text content, for example, so-called trolls in social networks during an information warfare. The complexity of the problem of the author’s text, obviously, is exponentially higher, more likely authors. The presence of author’s text samples is also significant in advancing this problem. The attribution of the author’s text includes the following three problems:
– author discovery of text from probable or expected authors group, where the author is always in a suspects group;
– not identification of the author of a text author from a group of probable or expected authors, where the author may not be in a group of suspects;
– assessment of the possibility of this text, written by the author or not.
Therefore, the task of automatically determining the author of text content of scientific and technical direction is relevant and requires new (more perfect) approaches to its solution.
Objective of the study is to develop a method for determining the author in Ukrainian texts based on the technology of lingometry.
Method. Lingvometric method of algorithmic provision of content monitoring processes for solving the problem of automatic determination of the author of Ukrainian-language text content on the basis of technology of statistical analysis of linguistic diversity coefficients is developed. A decomposition of the method of determination of the author on the basis of analysis of such broadcasting factors as lexical diversity, degree (degree) of syntactic complexity, speech connectivity, singularity indexes and text concentrations is made. Also, author’s style parameters are analyzed as the number of words in a particular text, the total number of words in this text, the number of sentences, the number of prepositions, the number of conjunctions, the number of words with the frequency of 1, and the number of words with a frequency of 10 or more. The features of the developed is the adaptation of the morphological and syntactic analysis of lexical units to the features of the designs of Ukrainian-language words / texts. That is, in the analysis of linguistic units of the type of words, the affiliation with the part of speech and declarations within this part of the language was taken into account. To do this, an analysis of the flexion of these words was carried out for classification, the allocation of the basis for the formation of the corresponding alphabet-frequency dictionaries. The filling of these dictionaries was further taken into account in the subsequent steps of determining the authorship of the text as the calculation of parameters and coefficients of copyright broadcasting. For the individual style of a writer, it is precisely service (stop or reference) words that are indicative because they are not related to the topic and content of the publication.
Results. A comparison of results on a plurality of 200 individual technical works of about 100 different authors over the period 2001–2017 has been made to determine whether the coefficients of the diversity of the text of these authors are different at different intervals.
Conclusions. It has been found that for the chosen experimental base with over 200 works of the best results, the method of analysis of the article without initial obligatory information as annotations and keywords in various languages and the list of literature achieves the density criterion.
References
Mobasher B. Data mining for web personalization, The adaptive web, 2007, Vol. 4321, pp. 90–135.
Dinucă C., Ciobanu D. Web Content Mining. In: University of Petroşani, Economics, 2012, Vol. 12, pp. 85–92.
Xu G. Zhang Y., Li L. Web content mining, Web Mining and Social Networking, 2011, Vol. 6, pp. 71–87.
Khribi M. K., Jemni M., Nasraoui O. Automatic recommendations for e-learning personalization based on web usage mining techniques and information retrieval, Advanced Learning Technologies : International Conference, 1–5 July 2008 : proceedings. Santander, Cantabria, Spain, IEEE, 2008, pp. 241–245.
Ferretti S., Mirri S., Prandi C., Salomoni P. Automatic web content personalization through reinforcement learning, Journal of Systems and Software, 2016, Vol. 121, pp. 157– 169.
Lavie T., Sela M., Oppenheim I., Inbar O., Meyer J. User attitudes towards news content personalization, International journal of human-computer studies, 2010, Vol. 68(8), pp. 483–495.
Fredrikson M., Livshits B. Repriv: Re-imagining content personalization and in-browser privacy, Symposium on Security and Privacy: Conference, 22–25 May 2011 : proceedings. Berkeley, CA, USA, IEEE, 2011. pp. 131–146.
Chang. C., Chen P., Chiu F., Chen Y. Application of neural networks and Kano’s method to content recommendation in web personalization, Expert Systems with Applications, 2009, Vol. 36(3), pp. 5310–5316.
Partovi H., Brathwaite R., Davis A., McCue M., Porter B., Giannandrea J., Li Z. (US) Pat. US7,571,226B1 US Content personalization over an interface with adaptive voice character, U.S. ; TellMe Networks, Inc., Mountain View, CA (US). No.: 09/523,853 ; Marz 14, 2009; August 4, 2009, Patent and Trademark Office, 20 p.
Kane F. J., Hicks C. (US) Pat. US2009/0171968A1 US Widget-assisted content personalization based on user behaviors tracked across multiple web sites; Amazon Technologies Inc (US). No.: 11/966,817; December 28, 2007; Jule 2, 2009, Google Patents, 24 p.
Mirri S., Prandi C., Salomoni P. Experiential adaptation to provide user-centered web content personalization, Advances in Human oriented and Personalized Mechanisms, Technologies, and Services : The Sixth International Conference, October 27 – November 1, 2013: proceedings. Venice, Italy, IARIA, 2003, pp. 31–36.
Fernandez-Luque L., Karlsen R., Bonander J. Review of extracting information from the Social Web for health personalization, Journal of medical Internet research, 2011, Vol. 13(1), P. 15.
Hauser E. (US) Pat. US8,019,777B2 US Digital content personalization method and system; CRICKET MEDIA Inc (US). No.: 12/795,419 ; June 7, 2010; September 13, 2011, Patent and Trademark Office, 15 p.
Ho S. Y., Bodoff D., Tam K. Y. Timing of adaptive web personalization and its effects on online consumer behavior, Information Systems Research, 2011, Vol. 22(3), pp. 660– 679.
Uchyigit G., Ma M. Y.. Personalization techniques and recommender systems. Singapore, World Scientific, 2008, 322 p.
Kothari N., Harder M., Howard R., Sanabria A., Schackow S. (US) Pat. US2006/0020883A1 Web page personalization; Microsoft Technology Licensing LLC (US). No.: 10/857,724 ; May 28, 2004; Januar 26, 2006, Patent and Trademark Office. – 18 p.
Zhang H., Song Y., Song H. T. Construction of ontologybased user model for web personalization, Lecture Notes in Computer Science, 2007, Vol. 4511, pp. 67–76.
Chien H. (US) Pat. US 8,254,892 B2 US Methods and apparatus for anonymous user identification and content personalization in wireless communication; AT&T Mobility II LLC (US). No.: 12/468,708 ; September 10, 2009; August 28, 2012, Patent and Trademark Office. – 9 p.
Linden G. D., Smith B. R., Zada N. K. (US) Pat. US7,970,664B2 US Content personalization based on actions performed during browsing sessions; Amazon Technologies Inc (US). No.: 11/009,732 ; December 10, 2004; June 28, 2011, Patent and Trademark Office, 36 p.
Mehtaa P., Parekh B., Modi K., Solanki P. Web personalization using web mining: concept and research issue, International Journal of Information and Education Technology, 2012, Vol. 2(5), pp. 510–512.
Zhezhnych P., Markiv O. Linguistic Comparison Quality Evaluation of Web-Site Content with Tourism Documentation Objects, Advances in Intelligent Systems and Computing, 2018, Vol. 689, pp. 656–667.
Basyuk T. The main reasons of attendance falling of internet resource, Computer Sciences and Information Technologies : Xth International Scientific and Technical Conference, 14– 17 September 2015 : proceedings. Lviv, IEEE, 2015, pp. 91–93.
Gozhyj A., Chyrun L., Kowalska-Styczen A., Lozynska O. Uniform Method of Operative Content Management in Web Systems, CEUR Workshop Proceedings, 2018, Vol. 2136, pp. 62–77.
Kravets P. The control agent with fuzzy logic, Perspective Technologies and Methods in MEMS Design : VIth International Conference, 20–23 April 2010 2015 : proceedings. Lviv, IEEE, 2015, pp. 40–41.
Davydov M., Lozynska O. Linguistic Models of Assistive Computer Technologies for Cognition and Communication, Computer Science and Information Technologies : XIth International Scientific and Technical Conference, 6–10 September 2016 : proceedings. Lviv, IEEE, 2016, pp. 171– 175.
Mykich K., Burov Y. Algebraic model for knowledge representation in situational awareness systems, Computer Sciences and Information Technologies : 11th International Scientific and Technical Conference, 6–10 September 2016 : proceedings. Lviv, IEEE, 2016, pp. 165–167.
Mykich K., Burov Y. Uncertainty in situational awareness systems, Modern Problems of Radio Engineering, Telecommunications and Computer Science : 13th International Conference, 623–26 Februar 2016 : proceedings. Lviv, IEEE, 2016, pp. 729–732.
Mykich K., Burov Y. Algebraic Framework for Knowledge Processing in Systems with Situational Awareness, Advances in Intelligent Systems and Computing, 2017, Vol. 512, pp. 217–227.
Mykich K., Burov Y. Research of uncertainties in situational awareness systems and methods of their processing, EasternEuropean Journal of Enterprise Technologies, 2016, Vol. 1(79), pp. 19–26.
Vysotska V. Linguistic Analysis of Textual Commercial Content for Information Resources Processing, Modern Problems of Radio Engineering, Telecommunications and Computer Science : International Scientific and Technical Conference, 23–26 February 2016 : proceedings. Lviv, IEEE, 2016, pp. 709–713.
Su J., Vysotska V., Sachenko A., Lytvyn V., Burov Y. Information resources processing using linguistic analysis of textual content, Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications : 9th International Conference, 21–23 September 2017 : proceedings. Bucharest, IEEE, 2017, pp. 573–578.
Lytvyn V., Vysotska V., Veres O., Rishnyak I., Rishnyak H. Content Linguistic Analysis Methods for Textual Documents Classification, Computer Science and Information Technologies : 11th International Scientific and Technical Conference, 6–10 September 2016 : proceedings. Lviv, IEEE, 2016, pp. 190–192.
Bisikalo O. V., Vysotska V. A. Identifying keywords on the basis of content monitoring method in ukrainian texts, Radio Electronics, Computer Science, Control, 2016, Vol. 1(36), pp. 74–83.
Bisikalo O. V., Vysotska V. A. Sentence syntactic analysis application to keywords identification Ukrainian texts, Radio Electronics, Computer Science, Control, Vol. 3(38), 2016, pp. 54–65.
Alieksieieva K., Berko A., Vysotska V. Technology of commercial web-resource management based on fuzzy logic Radio Electronics, Computer Science, Control, 2015, Vol. 3(34). pp. 71–79.
Lytvyn V., Bobyk I., Vysotska V. Application of algorithmic algebra system for grammatical analysis of symbolic computation expressions of propositional logic, Radio Electronics, Computer Science, Control, 2016, Vol. 4(39), pp. 54–67.
Lytvyn Vasyl, Vysotska Victoria, Dosyn Dmytro, Holoschuk Roman, Rybchak Zoriana Application of Sentence Parsing for Determining Keywords In Ukrainian Texts, Computer Science and Information Technologies : 12th International Scientific and Technical Conference, 5–8 September 2017 : proceedings. Lviv, IEEE, 2017, pp. 326– 331.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2020 V. Vysotska
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.