METHOD OF IMPERATIVE VARIABLES FOR SEARCH AUTOMATION OF TEXTUAL CONTENT IN UNSTRUCTURED DOCUMENTS

Authors

  • V. O. Boiko Khmelnytskyi National University, Khmelnytskyi, Ukraine , Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2024-2-12

Keywords:

textual search, unstructured text documents, natural language processing, rule-based search, generative artificial intelligence, imperative variables

Abstract

Context. Currently, there are a lot of approaches that are used for textual search. Nowadays, methods such as pattern-matching and optical character recognition are highly used for retrieving preferred information from documents with proven effectiveness. However, they work with a common or predictive document structure, while unstructured documents are neglected. The problem – is automating the textual search in documents with unstructured content. The object of the study was to develop a method and implement it into an efficient model for searching the content in unstructured textual information.

Objective. The goal of the work is the implementation of a rule-based textual search method and a model for seeking and retrieving information from documents with unstructured text content.

Method. To achieve the purpose of the research, the method of rule-based textual search in heterogenous content was developed and applied in the appropriately designed model. It is based on natural language processing that has been improved in recent years along with a new generative artificial intelligence becoming more available.

Results. The method has been implemented in a designed model that represents a pattern or a framework of unstructured textual search for software engineers. The application programming interface has been implemented.

Conclusions. The conducted experiments have confirmed the proposed software’s operability and allow recommendations for use in practice for solving the problems of textual search in unstructured documents. The prospects for further research may include the improvement of the performance using multithreading or parallelization for large textual documents along with the optimization approaches to minimize the impact of OpenAI application programming interface content processing limitations. Furthermore, additional investigation might incorporate extending the area of imperative variables usage in programming and software development.

Author Biography

V. O. Boiko, Khmelnytskyi National University, Khmelnytskyi, Ukraine

Assistant of the Department of Software Engineering

References

Dutta H., Gupta A. PNRank: Unsupervised ranking of person name entities from noisy OCR text, Decision support systems, 2021, P. 113662.

Kumar V., Chinmay B., Varsha N. A framework for document plagiarism detection using Rabin Karp method, International Journal of Innovative Research in Technology and Managemen, 2021, Vol. 5, pp. 18–19.

Onyenwe I. et al. Developing Smart Web-Search using Regex, International Journal on Natural Language Computing, 2022,Vol. 11, No. 3, pp. 25–30.

OCR – optical character recognition – azure AI services [Electronic resource], Microsoft Learn: Build skills that open doors in your career. Mode of access: https://learn.microsoft.com/en-us/azure/aiservices/computer-vision/overview-ocr (date of access: 23.03.2024). Title from screen.

Drobac S., Lindén K. Optical character recognition with neural networks and post-correction with finite state methods, International journal on document analysis and recognition (IJDAR), 2020, Vol. 23, No. 4, pp. 279–295.

Deshmukh M., Maheshwari S. Free form document based extraction using ML, International journal of science and research (IJSR), 2019, Vol. 8, P. 1.

Kwabena A. E. et al. An automated method for developing search strategies for systematic review using natural language processing (NLP), MethodsX, 2022, P. 101935.

Just J. Natural language processing for innovation search – Reviewing an emerging non-human innovation intermediary, Technovation, 2024, Vol. 129, P. 102883.

Allen K. S. et al. Natural language processing-driven state machines to extract social factors from unstructured clinical documentation, JAMIA open, 2023, Vol. 6, No. 2.

Li I. et al. Neural natural language processing for unstructured data in electronic health records: A review, Computer science review, 2022, Vol. 46, P. 100511.

Qiu Q. et al. Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques, Earth science informatics, 2020, Vol. 13, No. 4, pp. 1393–1410.

Research [Electronic resource], OpenAI. Mode of access: https://openai.com/research/overview (date of access: 24.03.2024). Title from screen.

Koubaa A. et al. Exploring ChatGPT capabilities and limitations: A critical review of the NLP game changer. Riyadh. Preprints, 2023, 29 p. (Preprint / Prince Sultan University; 2023030438).

Ekin S. Prompt Engineering For ChatGPT: A Quick Guide To Techniques, Tips, And Best Practices. Texas City: TechRxiv, 2023, 12 p. (Preprint / Texas A&M University; 22683919).

Chat completions API [Electronic resource]. Mode of access: https://platform.openai.com/docs/guides/textgeneration/chat-completions-api (date of access: 26.03.2024). – Title from screen.

Lee M. A mathematical investigation of hallucination and creativity in GPT models, Mathematics, 2023, Vol. 11, No. 10, P. 2320.

Kingma D. P., Ba J. Adam: A Method for Stochastic Optimization, 3rd International Conference for Learning Representations, San Diego, 7–9 May 2015.

Usage tiers [Electronic resource]. Mode of access: https://platform.openai.com/docs/guides/rate-limits/usagetiers?context=tier-one (date of access: 26.03.2024). Title from screen.

GPT-3.5 Turbo [Electronic resource]. Mode of access: https://platform.openai.com/docs/models/gpt-3-5-turbo (date of access: 26.03.2024). – Title from screen.

Downloads

Published

2024-06-27

How to Cite

Boiko, V. O. (2024). METHOD OF IMPERATIVE VARIABLES FOR SEARCH AUTOMATION OF TEXTUAL CONTENT IN UNSTRUCTURED DOCUMENTS. Radio Electronics, Computer Science, Control, (2), 117. https://doi.org/10.15588/1607-3274-2024-2-12

Issue

Section

Progressive information technologies