UA-LLM: ADVANCING CONTEXT-BASED QUESTION ANSWERING IN UKRAINIAN THROUGH LARGE LANGUAGE MODELS

Authors

  • M. V. Syromiatnikov Odesа Polytechnic National University, Odesa, Ukraine, Ukraine
  • V. M. Ruvinskaya Odesа Polytechnic National University, Odesa, Ukraine , Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2024-1-14

Keywords:

large language model, question-answering, few-shot learning, generative annotation

Abstract

Context. Context-based question answering, a fundamental task in natural language processing, demands a deep understanding of the language’s nuances. While being a sophisticated task, it’s an essential part of modern search systems, intelligent assistants, chatbots, and the whole Conversational AI field. While English, Chinese, and other widely spoken languages have gathered an extensive number of datasets, algorithms, and benchmarks, the Ukrainian language, with its rich linguistic heritage and intricate syntax, has remained among low-resource languages in the NLP community, making the Question Answering problem even harder.

Objective. The purpose of this work is to establish and benchmark a set of techniques, leveraging Large Language Models, combined in a single framework for solving the low-resource problem for Context-based question-answering task in Ukrainian.

Method. A simple yet flexible framework for leveraging Large Language Models, developed as a part of this research work, enlights two key methods proposed and evaluated in this paper for dealing with a small amount of training data for context-based question-answering tasks. The first one utilizes Zero-shot and Few-shot learning – the two major subfields of N-shot learning, where N corresponds to the number of training samples, to build a bilingual instruction-based prompt strategy for language models inferencing in an extractive manner (find an answer span in context) instead of their natural generative behavior (summarize the context according to question). The second proposed method is based on the first one, but instead of just answering the question, the language model annotates the input context through the generation of question-answer pairs for the given paragraph. This synthetic data is used for extractive model training. This paper explores both augmentation-based training, when there is some annotated data already, and completely synthetic training, when no data is available. The key benefit of these two methods is the ability to obtain comparable prediction quality even without an expensive and long-term human annotation process.

Results. Two proposed methods for solving the low-to-zero amount of training data problem for context-based questionanswering tasks in Ukrainian were implemented and combined into the flexible LLM experimentation framework.

Conclusions. This research comprehensively studied OpenAI GPT-3.5, OpenAI GPT-4, Cohere Command, and Meta LLaMa-2 language understanding capabilities applied to context-based question answering in low-resource Ukrainian. The thorough evaluation of proposed methods on a diverse set of metrics proves their efficiency, unveiling the possibility of building components of search engines, chatbot applications, and standalone general-domain CBQA systems with Ukrainian language support while having almost zero annotated data. The prospect for further research is to extend the scope from the CBQA task evaluated in this paper to all major NLU tasks with the final goal of establishing a complete benchmark for LLMs’ capabilities evaluation in the Ukrainian language.

Author Biographies

M. V. Syromiatnikov, Odesа Polytechnic National University, Odesa, Ukraine

Post-graduate student of the Department of Software Engineering

V. M. Ruvinskaya, Odesа Polytechnic National University, Odesa, Ukraine

PhD, Professor of the Department of Software Engineering

References

Rajpurkar P. The Stanford Question Answering Leaderboard [Electronic resource]. Access mode: https://rajpurkar.github.io/SQuAD-explorer/

Devlin J., Chang M., Lee K. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, 2–7 June 2019, proceedings. Stroudsburg, Association for Computational Linguistics, 2019, Vol. 1, pp. 4171–4186. DOI: 10.18653/V1/N19-1423

Conneau A., Lample G. Cross-lingual language model pretraining, 33rd International Conference on Neural Information Processing Systems, Vancouver, 8–14 December 2019, proceedings. New York, Curran Associates Inc., 2019, pp. 7059– 7069.

Conneau A., Khandelwal K., Goyal N. et al. Unsupervised Cross-lingual Representation Learning at Scale, 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020, proceedings. Stroudsburg, Association for Computational Linguistics, 2020, pp. 8440–8451. DOI: 10.18653/v1/2020.acl-main.747

Bubeck S., Chandrasekaran V., Eldan R. et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4. ArXiv preprint, 2023, Vol. 2303.12712.

Kojima T., Gu S., Reid M. et al. Large language models are zero-shot reasoners, 36th Annual Conference on Neural Information Processing Systems, New Orleans, November 28 – December 9, 2022, proceedings. New York, Curran Associates Inc., 2022, Vol. 35, pp. 22199–22213.

Riloff E., Thelen M. A rule-based question answering system for reading comprehension tests, 2000 ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computerbased language understanding systems, Seattle, 4 May 2000, proceedings. Stroudsburg, Association for Computational Linguistics, 2000, Vol. 6, pp. 13–19. DOI: 10.3115/1117595.1117598

Radev D., Fan W., Qi H. et al. Probabilistic question answering on the web, 11th international conference on World Wide Web, Honolulu, 7-11 May 2002, proceedings. New York, Association for Computing Machinery, 2022, pp. 408–419. DOI: 10.1145/511446.511500

Radev D., Prager J., Samn V. Ranking suspected answers to natural language questions using predictive annotation, 6th conference on Applied natural language processing, Seattle, 29 April 2000, proceedings. Stroudsburg, Association for Computational Linguistics, 2000, pp. 150–157. DOI: 10.3115/974147.974168

Ruder S., Peters M., Swayamdipta S. et al. Transfer Learning in Natural Language Processing, 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, 2–7 June 2019, proceedings. Stroudsburg, Association for Computational Linguistics, 2019, Tutorial Abstracts, pp. 15–18. DOI: 10.18653/v1/N19-5004

Howard J., Ruder S. Universal Language Model Fine-tuning for Text Classification, 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, 15–20 July 2018, proceedings. Stroudsburg, Association for Computational Linguistics, 2018, Vol. 1, pp. 328–339. DOI: 10.18653/v1/P18-1031

Luo M., Hashimoto K., Yavuz S. et al. Choose Your QA Model Wisely: A Systematic Study of Generative and Extractive Readers for Question Answering, Decoupling Logic from Knowledge: 1st Workshop on Semiparametric Methods in NLP, Dublin, 27 May 2022, proceedings. Stroudsburg, Association for Computational Linguistics, 2022, pp. 7–22. DOI: 10.18653/v1/2022.spanlp-1.2

Guerreiro N., Voita E., Martins A. Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation, 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, 2– 6 May 2023, proceedings. Stroudsburg, Association for Computational Linguistics, 2023, pp. 1059–1075. DOI: 10.18653/v1/2023.eacl-main.75

Zheng S., Huang J., Chang K. C. Why Does ChatGPT Fall Short in Providing Truthful Answers? ArXiv preprint, 2023, Vol. 2304.10513.

Anil R., Dai A. M., Firat O. et al. PaLM 2 Technical Report. ArXiv preprint, 2023, Vol. 2305.10403.

Touvron H., Martin L., Stone K. et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv preprint, 2023, Vol. 2307.09288.

Ye S., Hwang H., Yang S. et al. In-Context Instruction Learning. ArXiv preprint, 2023, Vol. 2302.14691.

Wei J., Wang X., Schuurmans D. et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, 36th Annual Conference on Neural Information Processing Systems, New Orleans, November 28 – December 9, 2022, proceedings. New York, Curran Associates Inc., 2022, Vol. 35, pp. 24824– 24837.

Wei J., Bosma J., Zhao M. et al. Finetuned Language Models Are Zero-Shot Learners, International Conference on Learning Representations, Online, 25–29 April 2022, proceedings. ArXiv, 2022, Vol. 2109.01652.

Ruis L., Khan A., Biderman S. et al. Large language models are not zero-shot communicators, 37th Annual Conference on Neural Information Processing Systems, New Orleans, 10–16 December 2023, proceedings. San Diego, NeurIPS, 2023.

Bang Y., Cahyawijaya S., Lee N. et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. ArXiv preprint, 2023, Vol. 2302.04023.

Brown T., Mann B., Ryder N. et al. Language models are fewshot learners, 34th Annual Conference on Neural Information Processing Systems, Vancouver, 6–12 December, 2020, proceedings. New York, Curran Associates Inc., 2020, Vol. 33, pp. 1877–1901.

OpenAI. GPT-4 Technical Report. ArXiv preprint, 2023, Vol. 2303.08774.

Liang P., Bommasani R., Lee T. et al. Holistic Evaluation of Language Models, Transactions on Machine Learning Research, 2023. ArXiv, Vol. 2211.09110.

He P., Liu X., Gao J. et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention, International Conference on Learning Representations, Online, 3–7 May, 2021, proceedings. ArXiv, 2021, Vol. 2006.03654.

He P., Gao J., Chen W. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with GradientDisentangled Embedding Sharing, International Conference on Learning Representations, Online, 3–7 May, 2023, proceedings. ArXiv, 2023, Vol. 2111.09543.

Rajpurkar P., Zhang J., Lopyrev K. et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text, 2016 Conference on Empirical Methods in Natural Language Processing, Austin, 1–5 November 2016, proceedings. Stroudsburg, Association for Computational Linguistics, 2016, pp. 2383–2392. DOI: 10.18653/v1/D16-1264

Papineni K., Roukos S., Ward T. et al. Bleu: a Method for Automatic Evaluation of Machine Translation, 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, 6–12 July 2002, proceedings. Stroudsburg, Association for Computational Linguistics, 2002, pp. 311–318. DOI: 10.3115/1073083.1073135

Lin C. ROUGE: A Package for Automatic Evaluation of Summaries, Text Summarization Branches Out: ACL-04 Workshop, Barcelona, 25–26 July 2004, proceedings. Stroudsburg, Association for Computational Linguistics, 2004, pp. 74–81.

Yadan O. Hydra – A framework for elegantly configuring complex applications [Electronic resource]. Access mode: https://github.com/facebookresearch/hydra.

Rajpurkar P., Jia R., Liang P. Know What You Don’t Know: Unanswerable Questions for SQuAD, 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, 15– 20 July 2018, proceedings. Stroudsburg, Association for Computational Linguistics, 2018, Vol. 2, pp. 784–789. DOI: 10.18653/v1/P18-2124

Ivanyuk-Skulskiy B., Zaliznyi A., Reshetar O. et al. ua_datasets: a collection of Ukrainian language datasets [Electronic resource]. Access mode: https://github.com/fido-ai/uadatasets.

Krisilov V., Komleva N. Analysis and evaluation of competence of information sources in problems of intellectual data processing, Problemele energeticii regionale, 2019, Vol. 40, Issue 1, pp. 91–104. DOI: 10.5281/zenodo.3239185.

Ahuja K., Diddee H., Hada R. et al. MEGA: Multilingual Evaluation of Generative AI. ArXiv preprint, 2023, Vol. 2303.12528.

Downloads

Published

2024-04-02

How to Cite

Syromiatnikov, M. V., & Ruvinskaya, V. M. (2024). UA-LLM: ADVANCING CONTEXT-BASED QUESTION ANSWERING IN UKRAINIAN THROUGH LARGE LANGUAGE MODELS . Radio Electronics, Computer Science, Control, (1), 147. https://doi.org/10.15588/1607-3274-2024-1-14

Issue

Section

Neuroinformatics and intelligent systems