Abstract
Gujarati is an Indo-Aryan language with more than 55 million speakers, making it an important language to consider in machine translation. It has limited parallel corpora, complex morphology, and no context preservation. The typical neural machine translation methods tend to fail in low-resource settings, resulting in syntactic errors and semantic drifts. To overcome these shortcomings, this paper presents Gujarati-Translation with Generative Adversarial Network (G-TransGAN), a new hybrid model that combines conditional Generative Adversarial Networks (cGANs), morphology-sensitive Sentence Piece tokenization, multilingual transformer embeddings (XLM-RoBERTa and Indic BERT), and optimization techniques such as Sharpness-Aware Minimization (SAM) and Low-Rank Adaptation (LoRA). The main goal is to maximize fluency, semantic retention, and domain flexibility in low-resource Gujarati-English translation. The workflow includes five steps: data augmentation, pre-processing and tokenization, contextual embedding, semantic translation, and optimization. The experimental findings indicate that G-TransGAN had better performance on various measures, including BLEU (38.4), METEOR (0.76), and TER (0.46). Such results support the model as able to produce high-quality, human-like translations and yet remain computationally feasible in low-resource real-world settings.
Keywords
Low-Resource Translation, Gujarati-English, GAN-Based Augmentation, Transformer Models, Semantic Preservation, Optimization,Downloads
References
- F. Eranpurwala, P. Ramane, B. K. Bolla, Comparative Study of Marathi Text Classification using Monolingual and Multilingual Embeddings. International Conference on Advanced Network Technologies and Intelligent Computing, 1534, (2021) 441–452. https://doi.org/10.1007/978-3-030-96040-7_35
- A. Nandi, K. Sarkar, A. Mallick, A. De, A survey of Hate Speech Detection in Indian languages. Social Network Analysis and Mining, 14(1), (2024) 70. https://doi.org/10.1007/s13278-024-01223-y
- H. Huang, H. Zhang, Y. Wang, H. Liu, X. Chen, Y. Chen, Y. Liang, Integrated Robust Optimization for Lightweight Transformer Models in Low-Resource Scenarios. Symmetry, 17(7), (2025) 1162. https://doi.org/10.3390/sym17071162
- F. Z. El-Alami, S. O. El Alaoui, N. E. Nahnahi, A Multilingual Offensive Language Detection Method based on Transfer Learning from Transformer Fine-Tuning Model. Journal of King Saud University – Computer and Information Sciences, 34(8), (2022) 6048–6056. https://doi.org/10.1016/j.jksuci.2021.07.013
- K. Manohar, R. Rajan, Improving speech Recognition Systems for the Morphologically Complex Malayalam Language using Subword Tokens for Language Modeling. EURASIP Journal on Audio, Speech, and Music Processing, 2023(1), (2023) 47. https://doi.org/10.1186/s13636-023-00313-7
- P. Prasada, M. V. P. Rao, Reinforcement of low-resource language translation with neural machine translation and backtranslation synergies. IAES International Journal of Artificial Intelligence, 13(3), (2024) 3478–3488. https://doi.org/10.11591/ijai.v13.i3.pp3478-3488
- J. W. Brook, (2023). Towards improving Neural Machine Translation Systems for Lower-Resourced Languages: Optimising Preprocessing and Data Augmentation Techniques. https://doi.org/10.13140/RG.2.2.27649.26721
- H. Hua, X. Li, D. Dou, C. Z. Xu, J. Luo, Improving Pretrained Language Model Fine-Tuning with Noise Stability Regularization. IEEE Transactions on Neural Networks and Learning Systems, 36(1), (2023) 1898–1910. https://doi.org/10.1109/TNNLS.2023.3330926
- F. Farahanipad, M. Rezaei, M. S. Nasr, F. Kamangar, V. Athitsos, A Survey on GAN-based Data Augmentation for Hand Pose Estimation Problem. Technologies, 10(2), (2022) 43. https://doi.org/10.3390/technologies10020043
- C. Koç, F. Özyurt, L. B. Iantovics, Survey on Latest Advances in Natural Language Processing Applications of Generative Adversarial Networks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 15(1), (2025) e70004. https://doi.org/10.1002/widm.70004
- S. Dey, S. Thakur, A. Kandwal, R. Kumar, S. Dasgupta, P. P. Roy, BharatBhasaNet—a Unified Framework to Identify Indian code mix languages. IEEE Access, IEEE, 12, (2024) 68893–68904. https://doi.org/10.1109/access.2024.3396290
- M. Rehan, M. S. I. Malik, M. M. Jamjoom, Fine-tuning Transformer Models using Transfer Learning for Multilingual Threatening text Identification. IEEE Access, IEEE, 11, (2023) 106503–106515. https://doi.org/10.1109/access.2023.3320062
- D. Ganatra, D. Domadiya, Toward Accurate English to Gujarati Language Translation: An Artificial Intelligence Framework using Neural Machine Translation. International Conference on Deep Learning and Visual Artificial Intelligence, (2024) 183–190. https://doi.org/10.1007/978-981-97-4533-3_14
- B. Wang, D. Liu, J. Wu, SSNFNet: An Enhanced Few-Shot Learning Model for Efficient Poultry Farming Detection. Animals, 15(15), (2025) 2252. https://doi.org/10.3390/ani15152252
- Y. Yu, C. H. H. Yang, J. Kolehmainen, P. G. Shivakumar, Y. Gu, S. R. R. Ren, I. Bulyko, (2023). Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, Taipei, Taiwan. https://doi.org/10.1109/ASRU57964.2023.10389632
- Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, L. Zettlemoyer, Multilingual denoising Pre-Training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8, (2020) 726–742. https://doi.org/10.1162/tacl_a_00343
- L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, C. Raffel, mT5: A Massively Multilingual Pre-Trained Text-To-Text Transformer. Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), (2021) 483–498. https://doi.org/10.18653/v1/2021.naacl-main.41
- A. Fan, S. Bhosale, H. Schwenk, Z. Ma, A. El-Kishky, S. Goyal, M. Baines, O. Celebi, G. Wenzek, V. Chaudhary, N. Goyal, T. Birch, V. Liptchinsky, S. Edunov, M. Auli, A. Joulin, Beyond English-Centric Multilingual Machine Translation. Journal of Machine Learning Research, 22(107), (2021) 1–48. https://www.jmlr.org/papers/v22/20-1307.html
- A. Onan, K. F. Balbal, Improving Turkish text Sentiment Classification through Task-Specific and Universal Transformations: An ensemble Data Augmentation Approach. IEEE Access, IEEE, 12, (2024) 4413–4458. https://doi.org/10.1109/ACCESS.2024.3349971
- A. Benayas, S. Miguel-Ángel, M. Mora-Cantallops, Enhancing Intent Classifier Training with Large Language Model-Generated Data. Applied Artificial Intelligence, 38(1), (2024) 2414483. https://doi.org/10.1080/08839514.2024.2414483
- Z. Chen, L. Zhao, HierLabelNet: A Two-Stage LLMs Framework with Data Augmentation and Label Selection for Geographic Text Classification. ISPRS International Journal of Geo-Information, 14(7), (2025) 268. https://doi.org/10.3390/ijgi14070268
- M. Kowsher, A. A. Sami, N. J. Prottasha, M. S. Arefin, P. K. Dhar, T. Koshiba, Bangla-BERT: Transformer-based Efficient Model for Transfer Learning and Language Understanding. IEEE Access, IEEE, 10, (2022) 91855–91870. https://doi.org/10.1109/ACCESS.2022.3197662
- P. Mishra, S. K. Narayanasamy, K. Srinivasan, Context-Aware Embedded Language Transformers for Evaluating Climate Change-Based Sustainable Development Goals. IEEE Access, IEEE, 13, (2025) 65757–65775. https://doi.org/10.1109/ACCESS.2025.3559548
- J. G. Tejero, M. Schmid, P. M. Neila, M. S. Zinkernagel, S. Wolf, R. Sznitman, (2025). SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, Tucson, AZ, USA. https://doi.org/10.1109/WACV61041.2025.00659
- L. Müller, N. Schwarz, L. Böcking, A. Bereczuk, H. Stagge, W. Kratsch, N. Kühl, Data-centric Fine-Tuning of Small Language Models for Automatic Extraction of Technical Requirements. IEEE Access, IEEE, 13, (2025) 135301–135315. https://doi.org/10.1109/ACCESS.2025.3591739
- Kaggle, Gujarati to English translation dataset. (n.d.). https://www.kaggle.com/datasets/utsav7vanodiya/gujarati-to-english-translation
Articles

