SYSTEM FOR WEB RESOURCES CONTENT STRUCTURING AND RECOGNIZING WITH THE MACHINE LEARNING ELEMENTS

M. P. Dyvak, A. V. Kovbasistyi, A. M. Melnyk, L. Y. Turchyn, Y. O. Маrtsenyuk

Abstract


Context. A large number of web resources of different organizations requires checking of relevance and correctness of the content, in particular, concerning characteristics of the organization, staff, etc. For this, it is necessary to develop a system of the automated content analysis. This task causes the need to develop a method and software for structuring and recognizing of web resources content. Existing parsing systems do not provide solving of the specified task, since they do not contain elements of machine learning. The object of the research is the process of automated analysis of the web resources content.
Objective. The goal of the work is the creation of the system for web resources content structuring and recognizing.
Method. The system of structuring and recognizing of text content of web resources with elements of machine learning is considered. Models of the system functioning are proposed. The architecture for realizing of software system for structuring and recognizing of text content of web resources is developed. Example of implementation of the model of developed system for structuring, recognizing and revealing of outdated and incorrect information about personnel on the web resource of educational institution is given.
Results. The developed software may be used by support services in order to update and correct the information content.
Conclusions. The system of structuring and recognizing of content of web resources with the machine learning elements has been considered. The proposed system compared with the known ones, ensures automated content structuring, recognizing of outdated, non-relevant or wrong information. Represented example of the structuring and recognizing of outdated and incorrect information on the website of educational institution confirms the effectiveness of the proposed system.

Keywords


Content analysis; parsing; machine learning.

References


Abernethy J., Bartlett P., Rakhlin A., Tewari A. Optimal strategies and minimax lower bounds for online convex games, Proceedings of the Nineteenth Annual Conference on Computational Learning Theory, COLT 2008, Pittsburgh, PA, USA, June 22–25, 2008, pp. 1–15. DOI: 10.1007/11776420

Aone C., Bennett S. W. Applying machine learning to anaphora resolution, Connectionist, statistical and symbolic approaches to learning for natural language processing. Berlin, Springer-Verlag, 1996, pp. 302–314. DOI: 10.1007/3-540-60925-3_55

Ayodele T. O. Types of machine learning algorithms in New

Advances in Machine Learning. Croatia, Rijeka, InTech, 2010,

pp. 19–48. DOI: 10.5772/9385

Barber D., Bartlett P., Bousquet O., Mendelson S. Bayesian

reasoning and machine learning, Local rademacher complexities. Annals of Statistics, 2005, Vol. 33, Issue. 4, pp. 1497–1537. DOI: 10.1145/2636805.2636813

Bengio Y. Learning deep architectures for AI Foundations and Trends in Machine Learning, 2009,Vol. 2, Issue 1, pp. 1–127. DOI:10.1561/2200000006

Gerbic P., Stacey E. A Purposive approach to content analysis: designing analytical frameworks, Internet and Higher Education, 2005, pp. 845–859. DOI: 10.1016/j.iheduc.2004.12.003

Types of Machine Learning Algorithms [Electronic resource].

Access mode : http://cdn.intechopen.com/pdfs/10694/InTech-

Types_of_machine_learning_algorithms.pdf

Harrington P. Machine Learning in Action. New York, Shelter

Island, 2012, P. 66–77.

Kovbasistyi A., Melnyk A., Dyvak M., Brych V. et al. Method for detection of non-relevant and wrong information based on content analysis of web resources, XIIIth International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH). Lviv, 2017, pp. 154–156. DOI:

1109/MEMSTECH.2017.7937555

Le Q. V, Ranzato M.-A, Monga R., Devin M. et al. Building highlevel features using large scale unsupervised learning [Electronic resource], International Conference on Machine Learning (ICML), 26–31 May 2013, Access mode :

https://ieeexplore.ieee.org/document/6639343/

Mayring P. Qualitative content analysis [Electronic resource], Forum: Qualitative Social Research. Access mode :

http://217.160.35.246/fqs-texte/2-00/2-00mayring-e.pdf

Smola A., Viswanathan S. V. N. Introduction to Machine Learning [Electronic resource] : eBook. Cambridge University Press, 2008, P.234. Access mode :

https://www.kth.se/social/pload/5397442af27654381071d167/chap 1.pdf

Tools for parsing in the work of an SEO specialist [Electronic resource]. Access mode: https://netpeak.net/ru/blog/instrumentydlya-parsinga-v-rabote-seo-spetsialista/.

Weare C., Lin W. Y. Content Analysis of the World Wide Web:Opportunities and Challenges, Social Science Computer Review,2002, Vol. 18, P. 272. DOI: 10.1177/089443930001800304

Witten Ian Н., Eibe Frank, Mark Hal Data Mining. Practical

Machine Learning Tools and Techniques [Electronic resource], [3rd Edition.]. San Mateo, Morgan Kaufmann, 2011. Access mode: https://www.elsevier.com/books/data-mining-practical-machinelearning-tools-and-techniques/witten/978-0-12-374856-0

Xu Lin, Holger H. Hoos, Kevin Leyton-Brown Hydra

Automatically configuring algorithms selection, In Twenty-Fourth Conference of the Association for the Advancement of Artificial Intelligence, 2010, pp. 210–216.


GOST Style Citations








Copyright (c) 2018 M. P. Dyvak, A. V. Kovbasistyi, A. M. Melnyk, L. Y. Turchyn, Y. O. Маrtsenyuk

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
Zaporizhzhya National Technical University, 
Zhukovskiy street, 64, Zaporizhzhya, 69063, Ukraine. 
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
E-mail: rvv@zntu.edu.ua

The reference to the journal is obligatory in the cases of complete or partial use of its materials.