SYSTEM FOR WEB RESOURCES CONTENT STRUCTURING AND RECOGNIZING WITH THE MACHINE LEARNING ELEMENTS
DOI:
https://doi.org/10.15588/1607-3274-2018-3-14Keywords:
Content analysis, parsing, machine learning.Abstract
Context. A large number of web resources of different organizations requires checking of relevance and correctness of the content, in particular, concerning characteristics of the organization, staff, etc. For this, it is necessary to develop a system of the automated content analysis. This task causes the need to develop a method and software for structuring and recognizing of web resources content. Existing parsing systems do not provide solving of the specified task, since they do not contain elements of machine learning. The object of the research is the process of automated analysis of the web resources content.Objective. The goal of the work is the creation of the system for web resources content structuring and recognizing.
Method. The system of structuring and recognizing of text content of web resources with elements of machine learning is considered. Models of the system functioning are proposed. The architecture for realizing of software system for structuring and recognizing of text content of web resources is developed. Example of implementation of the model of developed system for structuring, recognizing and revealing of outdated and incorrect information about personnel on the web resource of educational institution is given.
Results. The developed software may be used by support services in order to update and correct the information content.
Conclusions. The system of structuring and recognizing of content of web resources with the machine learning elements has been considered. The proposed system compared with the known ones, ensures automated content structuring, recognizing of outdated, non-relevant or wrong information. Represented example of the structuring and recognizing of outdated and incorrect information on the website of educational institution confirms the effectiveness of the proposed system.
References
Abernethy J., Bartlett P., Rakhlin A., Tewari A. Optimal strategies and minimax lower bounds for online convex games, Proceedings of the Nineteenth Annual Conference on Computational Learning Theory, COLT 2008, Pittsburgh, PA, USA, June 22–25, 2008, pp. 1–15. DOI: 10.1007/11776420
Aone C., Bennett S. W. Applying machine learning to anaphora resolution, Connectionist, statistical and symbolic approaches to learning for natural language processing. Berlin, Springer-Verlag, 1996, pp. 302–314. DOI: 10.1007/3-540-60925-3_55
Ayodele T. O. Types of machine learning algorithms in New
Advances in Machine Learning. Croatia, Rijeka, InTech, 2010,
pp. 19–48. DOI: 10.5772/9385
Barber D., Bartlett P., Bousquet O., Mendelson S. Bayesian
reasoning and machine learning, Local rademacher complexities. Annals of Statistics, 2005, Vol. 33, Issue. 4, pp. 1497–1537. DOI: 10.1145/2636805.2636813
Bengio Y. Learning deep architectures for AI Foundations and Trends in Machine Learning, 2009,Vol. 2, Issue 1, pp. 1–127. DOI:10.1561/2200000006
Gerbic P., Stacey E. A Purposive approach to content analysis: designing analytical frameworks, Internet and Higher Education, 2005, pp. 845–859. DOI: 10.1016/j.iheduc.2004.12.003
Types of Machine Learning Algorithms [Electronic resource].
Access mode : http://cdn.intechopen.com/pdfs/10694/InTech-
Types_of_machine_learning_algorithms.pdf
Harrington P. Machine Learning in Action. New York, Shelter
Island, 2012, P. 66–77.
Kovbasistyi A., Melnyk A., Dyvak M., Brych V. et al. Method for detection of non-relevant and wrong information based on content analysis of web resources, XIIIth International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH). Lviv, 2017, pp. 154–156. DOI:
1109/MEMSTECH.2017.7937555
Le Q. V, Ranzato M.-A, Monga R., Devin M. et al. Building highlevel features using large scale unsupervised learning [Electronic resource], International Conference on Machine Learning (ICML), 26–31 May 2013, Access mode :
https://ieeexplore.ieee.org/document/6639343/
Mayring P. Qualitative content analysis [Electronic resource], Forum: Qualitative Social Research. Access mode :
http://217.160.35.246/fqs-texte/2-00/2-00mayring-e.pdf
Smola A., Viswanathan S. V. N. Introduction to Machine Learning [Electronic resource] : eBook. Cambridge University Press, 2008, P.234. Access mode :
https://www.kth.se/social/pload/5397442af27654381071d167/chap 1.pdf
Tools for parsing in the work of an SEO specialist [Electronic resource]. Access mode: https://netpeak.net/ru/blog/instrumentydlya-parsinga-v-rabote-seo-spetsialista/.
Weare C., Lin W. Y. Content Analysis of the World Wide Web:Opportunities and Challenges, Social Science Computer Review,2002, Vol. 18, P. 272. DOI: 10.1177/089443930001800304
Witten Ian Н., Eibe Frank, Mark Hal Data Mining. Practical
Machine Learning Tools and Techniques [Electronic resource], [3rd Edition.]. San Mateo, Morgan Kaufmann, 2011. Access mode: https://www.elsevier.com/books/data-mining-practical-machinelearning-tools-and-techniques/witten/978-0-12-374856-0
Xu Lin, Holger H. Hoos, Kevin Leyton-Brown Hydra
Automatically configuring algorithms selection, In Twenty-Fourth Conference of the Association for the Advancement of Artificial Intelligence, 2010, pp. 210–216.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2018 M. P. Dyvak, A. V. Kovbasistyi, A. M. Melnyk, L. Y. Turchyn, Y. O. Маrtsenyuk
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.