International Journal of Advanced Multidisciplinary Application (IJAMA)

Peer reviewed Journal II Open access Journal II ISSN Approved No: 3048-9350

Natural Language Processing in Low-Resource Languages: Progress and Prospects

Author :Ritul Phukan¹, Monalisa Daimari², Anupam Kharghoria³, Biman Basumatary⁴

Affiliation:,2,3,4Department of Computer Science and Engineering, Assam Down Town University, Guwahati, India

Volume/Issue : Volume 2 Issue 9 -2025/Sep ,Pages : 4 to 8

Author Indexing :

 

Abstract

Low-resource languages—languages with limited annotated corpora, lexicons, and digital resources—pose major challenges for modern natural language processing (NLP). Recent progress in transfer learning, multilingual pretraining, parameter-efficient adaptation, data augmentation, and community-driven dataset creation has substantially improved capabilities for many such languages, yet large performance gaps remain compared to high-resource languages. This article surveys the technical advances that enable NLP for low-resource languages (including unsupervised and weakly supervised methods, multilingual and massively multilingual models, few-shot and in-context learning with large language models, and adapter/LoRA-style parameter-efficient fine-tuning). We examine practical pipelines for tasks such as machine translation, speech recognition, OCR, and information extraction; describe prominent dataset and community projects; summarize typical evaluation strategies and their pitfalls; and outline promising research directions (community data collection, privacy-preserving methods, on-device adaptation, and ethics-aware deployments). The review highlights approaches that balance performance, compute cost, and data-efficiency, and recommends research and deployment practices to accelerate inclusive language technology.

Keywords

Low-resource languages, transfer learning, multilingual pretraining, few-shot learning, LoRA / adapters, data augmentation, machine translation, speech datasets, Masakhane, Common Voice

References

[1] A. Conneau et al., “Unsupervised cross-lingual representation learning at scale,” in Proc. ACL, 2020, pp. 8440–8451.

[2] S. Ruder, I. Vulić, and A. Søgaard, “A survey of cross-lingual word embedding models,” J. Artif. Intell. Res., vol. 65, pp. 569–631, 2019.

[3] J. Tiedemann, “Parallel data, tools and interfaces in OPUS,” in Proc. LREC, 2012, pp. 2214–2218.

[4] M. Nekoto et al., “Participatory research for low-resourced machine translation: A case study in African languages,” Findings of ACL: EMNLP 2020, pp. 2144–2160.

[5] K. Heffernan, A. Salesky, and A. Post, “Bitext mining using distant supervision for low-resource languages,” in Proc. NAACL-HLT, 2021, pp. 3617–3629.

[6] J. Schneider et al., “Common Voice: A massively-multilingual speech corpus,” in Proc. LREC, 2020, pp. 4218–4226.

 

Our Indexing

Author Benefits

Global visibility
Multidisciplinary platform Unique DOI Open access Affordable fees Fast review Author certificates No extra costs Secure platform Database indexing Interdisciplinary collaboration Submission tracking Academic recognition

 

editorinchief.ijama@gmail.com

© 2024  IJAMA

Contacts

editorinchief.ijama@gmail.com

Working days : Mon- Saturday 

Working Hours :9 am -5:30 Pm