Abstract

Gujarati is an Indo-Aryan language with more than 55 million speakers, making it an important language to consider in machine translation. It has limited parallel corpora, complex morphology, and no context preservation. The typical neural machine translation methods tend to fail in low-resource settings, resulting in syntactic errors and semantic drifts. To overcome these shortcomings, this paper presents Gujarati-Translation with Generative Adversarial Network (G-TransGAN), a new hybrid model that combines conditional Generative Adversarial Networks (cGANs), morphology-sensitive Sentence Piece tokenization, multilingual transformer embeddings (XLM-RoBERTa and Indic BERT), and optimization techniques such as Sharpness-Aware Minimization (SAM) and Low-Rank Adaptation (LoRA). The main goal is to maximize fluency, semantic retention, and domain flexibility in low-resource Gujarati-English translation. The workflow includes five steps: data augmentation, pre-processing and tokenization, contextual embedding, semantic translation, and optimization. The experimental findings indicate that G-TransGAN had better performance on various measures, including BLEU (38.4), METEOR (0.76), and TER (0.46). Such results support the model as able to produce high-quality, human-like translations and yet remain computationally feasible in low-resource real-world settings.

Keywords

Low-Resource Translation, Gujarati-English, GAN-Based Augmentation, Transformer Models, Semantic Preservation, Optimization,

Downloads

Download data is not yet available.

References

  1. F. Eranpurwala, P. Ramane, B. K. Bolla, Comparative Study of Marathi Text Classification using Monolingual and Multilingual Embeddings. International Conference on Advanced Network Technologies and Intelligent Computing, 1534, (2021) 441–452. https://doi.org/10.1007/978-3-030-96040-7_35
  2. A. Nandi, K. Sarkar, A. Mallick, A. De, A survey of Hate Speech Detection in Indian languages. Social Network Analysis and Mining, 14(1), (2024) 70. https://doi.org/10.1007/s13278-024-01223-y
  3. H. Huang, H. Zhang, Y. Wang, H. Liu, X. Chen, Y. Chen, Y. Liang, Integrated Robust Optimization for Lightweight Transformer Models in Low-Resource Scenarios. Symmetry, 17(7), (2025) 1162. https://doi.org/10.3390/sym17071162
  4. F. Z. El-Alami, S. O. El Alaoui, N. E. Nahnahi, A Multilingual Offensive Language Detection Method based on Transfer Learning from Transformer Fine-Tuning Model. Journal of King Saud University – Computer and Information Sciences, 34(8), (2022) 6048–6056. https://doi.org/10.1016/j.jksuci.2021.07.013
  5. K. Manohar, R. Rajan, Improving speech Recognition Systems for the Morphologically Complex Malayalam Language using Subword Tokens for Language Modeling. EURASIP Journal on Audio, Speech, and Music Processing, 2023(1), (2023) 47. https://doi.org/10.1186/s13636-023-00313-7
  6. P. Prasada, M. V. P. Rao, Reinforcement of low-resource language translation with neural machine translation and backtranslation synergies. IAES International Journal of Artificial Intelligence, 13(3), (2024) 3478–3488. https://doi.org/10.11591/ijai.v13.i3.pp3478-3488
  7. J. W. Brook, (2023). Towards improving Neural Machine Translation Systems for Lower-Resourced Languages: Optimising Preprocessing and Data Augmentation Techniques. https://doi.org/10.13140/RG.2.2.27649.26721
  8. H. Hua, X. Li, D. Dou, C. Z. Xu, J. Luo, Improving Pretrained Language Model Fine-Tuning with Noise Stability Regularization. IEEE Transactions on Neural Networks and Learning Systems, 36(1), (2023) 1898–1910. https://doi.org/10.1109/TNNLS.2023.3330926
  9. F. Farahanipad, M. Rezaei, M. S. Nasr, F. Kamangar, V. Athitsos, A Survey on GAN-based Data Augmentation for Hand Pose Estimation Problem. Technologies, 10(2), (2022) 43. https://doi.org/10.3390/technologies10020043
  10. C. Koç, F. Özyurt, L. B. Iantovics, Survey on Latest Advances in Natural Language Processing Applications of Generative Adversarial Networks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 15(1), (2025) e70004. https://doi.org/10.1002/widm.70004
  11. S. Dey, S. Thakur, A. Kandwal, R. Kumar, S. Dasgupta, P. P. Roy, BharatBhasaNet—a Unified Framework to Identify Indian code mix languages. IEEE Access, IEEE, 12, (2024) 68893–68904. https://doi.org/10.1109/access.2024.3396290
  12. M. Rehan, M. S. I. Malik, M. M. Jamjoom, Fine-tuning Transformer Models using Transfer Learning for Multilingual Threatening text Identification. IEEE Access, IEEE, 11, (2023) 106503–106515. https://doi.org/10.1109/access.2023.3320062
  13. D. Ganatra, D. Domadiya, Toward Accurate English to Gujarati Language Translation: An Artificial Intelligence Framework using Neural Machine Translation. International Conference on Deep Learning and Visual Artificial Intelligence, (2024) 183–190. https://doi.org/10.1007/978-981-97-4533-3_14
  14. B. Wang, D. Liu, J. Wu, SSNFNet: An Enhanced Few-Shot Learning Model for Efficient Poultry Farming Detection. Animals, 15(15), (2025) 2252. https://doi.org/10.3390/ani15152252
  15. Y. Yu, C. H. H. Yang, J. Kolehmainen, P. G. Shivakumar, Y. Gu, S. R. R. Ren, I. Bulyko, (2023). Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, Taipei, Taiwan. https://doi.org/10.1109/ASRU57964.2023.10389632
  16. Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, L. Zettlemoyer, Multilingual denoising Pre-Training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8, (2020) 726–742. https://doi.org/10.1162/tacl_a_00343
  17. L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, C. Raffel, mT5: A Massively Multilingual Pre-Trained Text-To-Text Transformer. Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), (2021) 483–498. https://doi.org/10.18653/v1/2021.naacl-main.41
  18. A. Fan, S. Bhosale, H. Schwenk, Z. Ma, A. El-Kishky, S. Goyal, M. Baines, O. Celebi, G. Wenzek, V. Chaudhary, N. Goyal, T. Birch, V. Liptchinsky, S. Edunov, M. Auli, A. Joulin, Beyond English-Centric Multilingual Machine Translation. Journal of Machine Learning Research, 22(107), (2021) 1–48. https://www.jmlr.org/papers/v22/20-1307.html
  19. A. Onan, K. F. Balbal, Improving Turkish text Sentiment Classification through Task-Specific and Universal Transformations: An ensemble Data Augmentation Approach. IEEE Access, IEEE, 12, (2024) 4413–4458. https://doi.org/10.1109/ACCESS.2024.3349971
  20. A. Benayas, S. Miguel-Ángel, M. Mora-Cantallops, Enhancing Intent Classifier Training with Large Language Model-Generated Data. Applied Artificial Intelligence, 38(1), (2024) 2414483. https://doi.org/10.1080/08839514.2024.2414483
  21. Z. Chen, L. Zhao, HierLabelNet: A Two-Stage LLMs Framework with Data Augmentation and Label Selection for Geographic Text Classification. ISPRS International Journal of Geo-Information, 14(7), (2025) 268. https://doi.org/10.3390/ijgi14070268
  22. M. Kowsher, A. A. Sami, N. J. Prottasha, M. S. Arefin, P. K. Dhar, T. Koshiba, Bangla-BERT: Transformer-based Efficient Model for Transfer Learning and Language Understanding. IEEE Access, IEEE, 10, (2022) 91855–91870. https://doi.org/10.1109/ACCESS.2022.3197662
  23. P. Mishra, S. K. Narayanasamy, K. Srinivasan, Context-Aware Embedded Language Transformers for Evaluating Climate Change-Based Sustainable Development Goals. IEEE Access, IEEE, 13, (2025) 65757–65775. https://doi.org/10.1109/ACCESS.2025.3559548
  24. J. G. Tejero, M. Schmid, P. M. Neila, M. S. Zinkernagel, S. Wolf, R. Sznitman, (2025). SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, Tucson, AZ, USA. https://doi.org/10.1109/WACV61041.2025.00659
  25. L. Müller, N. Schwarz, L. Böcking, A. Bereczuk, H. Stagge, W. Kratsch, N. Kühl, Data-centric Fine-Tuning of Small Language Models for Automatic Extraction of Technical Requirements. IEEE Access, IEEE, 13, (2025) 135301–135315. https://doi.org/10.1109/ACCESS.2025.3591739
  26. Kaggle, Gujarati to English translation dataset. (n.d.). https://www.kaggle.com/datasets/utsav7vanodiya/gujarati-to-english-translation