UA-LLM: ADVANCING CONTEXT-BASED QUESTION ANSWERING IN UKRAINIAN THROUGH LARGE LANGUAGE MODELS
DOI:
https://doi.org/10.15588/1607-3274-2024-1-14Keywords:
large language model, question-answering, few-shot learning, generative annotationAbstract
Context. Context-based question answering, a fundamental task in natural language processing, demands a deep understanding of the language’s nuances. While being a sophisticated task, it’s an essential part of modern search systems, intelligent assistants, chatbots, and the whole Conversational AI field. While English, Chinese, and other widely spoken languages have gathered an extensive number of datasets, algorithms, and benchmarks, the Ukrainian language, with its rich linguistic heritage and intricate syntax, has remained among low-resource languages in the NLP community, making the Question Answering problem even harder.
Objective. The purpose of this work is to establish and benchmark a set of techniques, leveraging Large Language Models, combined in a single framework for solving the low-resource problem for Context-based question-answering task in Ukrainian.
Method. A simple yet flexible framework for leveraging Large Language Models, developed as a part of this research work, enlights two key methods proposed and evaluated in this paper for dealing with a small amount of training data for context-based question-answering tasks. The first one utilizes Zero-shot and Few-shot learning – the two major subfields of N-shot learning, where N corresponds to the number of training samples, to build a bilingual instruction-based prompt strategy for language models inferencing in an extractive manner (find an answer span in context) instead of their natural generative behavior (summarize the context according to question). The second proposed method is based on the first one, but instead of just answering the question, the language model annotates the input context through the generation of question-answer pairs for the given paragraph. This synthetic data is used for extractive model training. This paper explores both augmentation-based training, when there is some annotated data already, and completely synthetic training, when no data is available. The key benefit of these two methods is the ability to obtain comparable prediction quality even without an expensive and long-term human annotation process.
Results. Two proposed methods for solving the low-to-zero amount of training data problem for context-based questionanswering tasks in Ukrainian were implemented and combined into the flexible LLM experimentation framework.
Conclusions. This research comprehensively studied OpenAI GPT-3.5, OpenAI GPT-4, Cohere Command, and Meta LLaMa-2 language understanding capabilities applied to context-based question answering in low-resource Ukrainian. The thorough evaluation of proposed methods on a diverse set of metrics proves their efficiency, unveiling the possibility of building components of search engines, chatbot applications, and standalone general-domain CBQA systems with Ukrainian language support while having almost zero annotated data. The prospect for further research is to extend the scope from the CBQA task evaluated in this paper to all major NLU tasks with the final goal of establishing a complete benchmark for LLMs’ capabilities evaluation in the Ukrainian language.
References
Rajpurkar P. The Stanford Question Answering Leaderboard [Electronic resource]. Access mode: https://rajpurkar.github.io/SQuAD-explorer/
Devlin J., Chang M., Lee K. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, 2–7 June 2019, proceedings. Stroudsburg, Association for Computational Linguistics, 2019, Vol. 1, pp. 4171–4186. DOI: 10.18653/V1/N19-1423
Conneau A., Lample G. Cross-lingual language model pretraining, 33rd International Conference on Neural Information Processing Systems, Vancouver, 8–14 December 2019, proceedings. New York, Curran Associates Inc., 2019, pp. 7059– 7069.
Conneau A., Khandelwal K., Goyal N. et al. Unsupervised Cross-lingual Representation Learning at Scale, 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020, proceedings. Stroudsburg, Association for Computational Linguistics, 2020, pp. 8440–8451. DOI: 10.18653/v1/2020.acl-main.747
Bubeck S., Chandrasekaran V., Eldan R. et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4. ArXiv preprint, 2023, Vol. 2303.12712.
Kojima T., Gu S., Reid M. et al. Large language models are zero-shot reasoners, 36th Annual Conference on Neural Information Processing Systems, New Orleans, November 28 – December 9, 2022, proceedings. New York, Curran Associates Inc., 2022, Vol. 35, pp. 22199–22213.
Riloff E., Thelen M. A rule-based question answering system for reading comprehension tests, 2000 ANLP/NAACL Workshop on Reading comprehension tests as evaluation for computerbased language understanding systems, Seattle, 4 May 2000, proceedings. Stroudsburg, Association for Computational Linguistics, 2000, Vol. 6, pp. 13–19. DOI: 10.3115/1117595.1117598
Radev D., Fan W., Qi H. et al. Probabilistic question answering on the web, 11th international conference on World Wide Web, Honolulu, 7-11 May 2002, proceedings. New York, Association for Computing Machinery, 2022, pp. 408–419. DOI: 10.1145/511446.511500
Radev D., Prager J., Samn V. Ranking suspected answers to natural language questions using predictive annotation, 6th conference on Applied natural language processing, Seattle, 29 April 2000, proceedings. Stroudsburg, Association for Computational Linguistics, 2000, pp. 150–157. DOI: 10.3115/974147.974168
Ruder S., Peters M., Swayamdipta S. et al. Transfer Learning in Natural Language Processing, 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, 2–7 June 2019, proceedings. Stroudsburg, Association for Computational Linguistics, 2019, Tutorial Abstracts, pp. 15–18. DOI: 10.18653/v1/N19-5004
Howard J., Ruder S. Universal Language Model Fine-tuning for Text Classification, 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, 15–20 July 2018, proceedings. Stroudsburg, Association for Computational Linguistics, 2018, Vol. 1, pp. 328–339. DOI: 10.18653/v1/P18-1031
Luo M., Hashimoto K., Yavuz S. et al. Choose Your QA Model Wisely: A Systematic Study of Generative and Extractive Readers for Question Answering, Decoupling Logic from Knowledge: 1st Workshop on Semiparametric Methods in NLP, Dublin, 27 May 2022, proceedings. Stroudsburg, Association for Computational Linguistics, 2022, pp. 7–22. DOI: 10.18653/v1/2022.spanlp-1.2
Guerreiro N., Voita E., Martins A. Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation, 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, 2– 6 May 2023, proceedings. Stroudsburg, Association for Computational Linguistics, 2023, pp. 1059–1075. DOI: 10.18653/v1/2023.eacl-main.75
Zheng S., Huang J., Chang K. C. Why Does ChatGPT Fall Short in Providing Truthful Answers? ArXiv preprint, 2023, Vol. 2304.10513.
Anil R., Dai A. M., Firat O. et al. PaLM 2 Technical Report. ArXiv preprint, 2023, Vol. 2305.10403.
Touvron H., Martin L., Stone K. et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv preprint, 2023, Vol. 2307.09288.
Ye S., Hwang H., Yang S. et al. In-Context Instruction Learning. ArXiv preprint, 2023, Vol. 2302.14691.
Wei J., Wang X., Schuurmans D. et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, 36th Annual Conference on Neural Information Processing Systems, New Orleans, November 28 – December 9, 2022, proceedings. New York, Curran Associates Inc., 2022, Vol. 35, pp. 24824– 24837.
Wei J., Bosma J., Zhao M. et al. Finetuned Language Models Are Zero-Shot Learners, International Conference on Learning Representations, Online, 25–29 April 2022, proceedings. ArXiv, 2022, Vol. 2109.01652.
Ruis L., Khan A., Biderman S. et al. Large language models are not zero-shot communicators, 37th Annual Conference on Neural Information Processing Systems, New Orleans, 10–16 December 2023, proceedings. San Diego, NeurIPS, 2023.
Bang Y., Cahyawijaya S., Lee N. et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. ArXiv preprint, 2023, Vol. 2302.04023.
Brown T., Mann B., Ryder N. et al. Language models are fewshot learners, 34th Annual Conference on Neural Information Processing Systems, Vancouver, 6–12 December, 2020, proceedings. New York, Curran Associates Inc., 2020, Vol. 33, pp. 1877–1901.
OpenAI. GPT-4 Technical Report. ArXiv preprint, 2023, Vol. 2303.08774.
Liang P., Bommasani R., Lee T. et al. Holistic Evaluation of Language Models, Transactions on Machine Learning Research, 2023. ArXiv, Vol. 2211.09110.
He P., Liu X., Gao J. et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention, International Conference on Learning Representations, Online, 3–7 May, 2021, proceedings. ArXiv, 2021, Vol. 2006.03654.
He P., Gao J., Chen W. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with GradientDisentangled Embedding Sharing, International Conference on Learning Representations, Online, 3–7 May, 2023, proceedings. ArXiv, 2023, Vol. 2111.09543.
Rajpurkar P., Zhang J., Lopyrev K. et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text, 2016 Conference on Empirical Methods in Natural Language Processing, Austin, 1–5 November 2016, proceedings. Stroudsburg, Association for Computational Linguistics, 2016, pp. 2383–2392. DOI: 10.18653/v1/D16-1264
Papineni K., Roukos S., Ward T. et al. Bleu: a Method for Automatic Evaluation of Machine Translation, 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, 6–12 July 2002, proceedings. Stroudsburg, Association for Computational Linguistics, 2002, pp. 311–318. DOI: 10.3115/1073083.1073135
Lin C. ROUGE: A Package for Automatic Evaluation of Summaries, Text Summarization Branches Out: ACL-04 Workshop, Barcelona, 25–26 July 2004, proceedings. Stroudsburg, Association for Computational Linguistics, 2004, pp. 74–81.
Yadan O. Hydra – A framework for elegantly configuring complex applications [Electronic resource]. Access mode: https://github.com/facebookresearch/hydra.
Rajpurkar P., Jia R., Liang P. Know What You Don’t Know: Unanswerable Questions for SQuAD, 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, 15– 20 July 2018, proceedings. Stroudsburg, Association for Computational Linguistics, 2018, Vol. 2, pp. 784–789. DOI: 10.18653/v1/P18-2124
Ivanyuk-Skulskiy B., Zaliznyi A., Reshetar O. et al. ua_datasets: a collection of Ukrainian language datasets [Electronic resource]. Access mode: https://github.com/fido-ai/uadatasets.
Krisilov V., Komleva N. Analysis and evaluation of competence of information sources in problems of intellectual data processing, Problemele energeticii regionale, 2019, Vol. 40, Issue 1, pp. 91–104. DOI: 10.5281/zenodo.3239185.
Ahuja K., Diddee H., Hada R. et al. MEGA: Multilingual Evaluation of Generative AI. ArXiv preprint, 2023, Vol. 2303.12528.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 M. V. Syromiatnikov, V. M. Ruvinskaya
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.