GenSL-Trans: Direct Visual-to-Visual Arabic-to-English Sign Language Translation via Mobile-Optimized Unet-Transformers in Immersive Environments

Authors

DOI:

https://doi.org/10.22399/ijcesen.3821

Keywords:

Mobile Interactive, Direct Sign Language Translation, Metaverse, Transformer, GPT, BERT, Generative AI

Abstract

We propose a real-time, mobile-interactive pipeline for direct Arabic-to-English Sign Language (ArSL-to-ESL) translation in the metaverse, preserving the visual-spatial nature of sign languages without textual intermediaries. Central to this system is a newly created bilingual mapping dataset between Arabic and English sign language, which enables accurate cross-lingual alignment of gestural patterns and forms the foundation for direct, grammar-preserving translation. The system captures gestures via VR headsets or smartphone cameras at 90 fps (1080p, H.264), with on-device preprocessing (OpenCV) optimized via NNAPI or Core ML. A quantized YOLOv11 (int8) model with Kalman tracking achieves 92% accuracy on the mapping dataset with <11 ms inference on mobile GPUs. Visual features are encoded via 14×14 patch embedding into 256D tokens and processed by GenSL-Trans a lightweight (14M params) vision Transformer (8 heads, FFN=1024) to map sign gestures directly to target ESL representations. The Bi-LSTM, BERT, and GPT-2 decoders generate spatiotemporal sequences with adaptive on-device/cloud execution. A CNN-based renderer with Conv2DT layers and U-Net skips produces 224×224 px video frames, driving a lightweight 3D avatar streamed via glTF and rendered in real time using WebXR, accessible on mobile browsers (iOS/Android) or VR headsets, with end-to-end latency <180 ms. Mobile interactivity allows touch-based control (start/stop, speed, expressions, feedback), ensuring accessibility and personalization. By integrating on-device AI, direct gesture-to-gesture translation, and immersive rendering, our system provides an inclusive communication bridge for Deaf users across Arabic- and English-speaking communities.

References

[1] M. Alamri and S. Lajmi, "Design a smart platform translating Arabic sign language to English language," Int. J. Power Electron. Drive Syst., vol. 14, no. 4, pp. 4759–4774, Dec. 2024, doi: 10.11591/ijece.v14i4.pp4759-4774.

[2] E. K. Elsayed and D. R. Fathy, "Sign language semantic translation system using ontology and deep learning," Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 1, pp. 137–144, 2020, doi: 10.14569/IJACSA.2020.0110118.

[3] N. Aouiti and M. Jemni, "Translation system from Arabic text to Arabic sign language," J. Arabic Islamic Stud., vol. 3, no. 2, pp. 57–70, 2018, doi: 10.33633/JAIS.V3I2.2041.

[4] A. A. Alethary, A. H. Aliwy, and N. S. Ali, "Automated Arabic-Arabic sign language translation system based on 3D avatar technology," Int. J. Adv. Appl. Sci., vol. 11, no. 4, pp. 383–396, Dec. 2022, doi: 10.11591/ijaas.v11.i4.pp383-396.

[5] A. M. Almasoud and H. S. Al-Khalifa, "A proposed semantic machine translation system for translating Arabic text to Arabic sign language," in Proc. 5th Int. Conf. Pervasive Technol. Related to Assistive Environ., Heraklion, Greece, Jun. 2011, pp. 1–8, doi: 10.1145/2107556.2107579.

[6] A. Boukdir, M. Benaddy, A. Ellahyani, O. E. Meslouhi, and M. Kardouchi, "Isolated video-based Arabic sign language recognition using convolutional and recursive neural networks," Arabian J. Sci. Eng., vol. 47, no. 2, pp. 2187–2199, 2022. doi: 10.1007/s13369-021-05979-8.

[7] R. S. Abdul Ameer, M. A. Ahmed, Z. T. Al-Qaysi, M. M. Salih, and M. L. Shuwandy, "Empowering communication: A deep learning framework for Arabic sign language recognition with an attention mechanism," Computers, vol. 13, no. 6, p. 153, Jun. 2024. doi: 10.3390/computers13060153.

[8] K. M. Nahar, A. Almomani, N. Shatnawi, and M. Alauthman, "A robust model for translating Arabic sign language into spoken Arabic using deep learning," Intell. Autom. Soft Comput., vol. 37, no. 3, pp. 2037–2057, 2023. doi: 10.32604/iasc.2023.038175.

[9] S. Hayani, M. Benaddy, O. El Meslouhi, and M. Kardouchi, "Arab sign language recognition with convolutional neural networks," in Proc. 2019 Int. Conf. Comput. Sci. Renew. Energy (ICCSRE), Marrakech, Morocco, Jul. 2019, pp. 1–4. doi: 10.1109/ICCSRE47301.2019.8963530.

[10] B. Zhou, Z. Chen, A. Clapés, J. Wan, Y. Liang, S. Escalera, and D. Zhang, "Gloss-free sign language translation: Improving from visual-language pretraining," in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 20 871–20 881. doi: 10.1109/ICCV51070.2023.01925.

[11] H. Zhou, W. Zhou, W. Qi, J. Pu, and H. Li, "Improving sign language translation with monolingual data by sign back-translation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 1316–1325. doi: 10.1109/CVPR51954.2021.00136.

[12] 1E. Mahmoud, K. Wassif, and H. Bayomi, "Transfer learning and recurrent neural networks for automatic Arabic sign language recognition," in Adv. Mach. Learn. Technol. Appl., ser. AISC, vol. 1490, A. E. Hassanien et al., Eds. Springer, Cham, 2022, pp. 47–59. doi: 10.1007/978-3-030-95065-2_5.

[13] Y. Saleh and G. Issa, "Arabic sign language recognition through deep neural networks fine-tuning," in Proc. Int. Conf. Adv. Intell. Syst. Signal Process. (AISSP), 2020, pp. 45–50. [Online]. Available: https://www.researchgate.net/publication/344347972

[14] L. Gao, W. Feng, P. Shi, R. Han, D. Lin, and L. Wan, "Sign language translation with hierarchical memorized context in question answering scenarios," Neural Comput. Appl., 2024. doi: 10.1007/s00521-024-10042-5.

[15] G. Latif, J. Alghazo, N. Mohammad, R. AlKhalaf, and R. AlKhalaf, "Arabic Alphabets Sign Language Dataset (ArASL)," Mendeley Data, v. 1, 2018. [Online]. Available: https://doi.org/10.17632/y7pckrw6z2.1

[16] G. Latif et al., "ArSL21L – Arabic Sign Language Letter Dataset," Mendeley Data, 2024. [Online]. Available: https://data.mendeley.com/datasets/f63xhm286w/1

[17] J. Béres, L. Makra, and A. Gulyás, "YOLOv3-based real-time sign language hand gesture detection," in Proc. 12th IEEE Int. Conf. on Computational Cybernetics (ICCC), Hungary, 2020, pp. 1–6. doi: 10.1109/ICCC49849.2020.9252075.

[18] R. Ameer, M. A. Ahmed, Z. Al-Qaysi, M. Salih, and M. Shuwandy, "Empowering Communication: A Deep Learning Framework for Arabic Sign Language Recognition with an Attention Mechanism," Computers, vol. 13, no. 6, p. 153, 2024. doi: 10.3390/computers13060153.

[19] A. B. H. Amor, O. El Ghoul, and M. Jemni, "An EMG dataset for Arabic sign language alphabet letters and numbers," Data in Brief, vol. 51, p. 109770, 2023, doi: 10.1016/j.dib.2023.109770 .

Downloads

Published

2025-09-30

How to Cite

BOUARARA, H. A., Benyahia, K., & Rahmani, M. E. (2025). GenSL-Trans: Direct Visual-to-Visual Arabic-to-English Sign Language Translation via Mobile-Optimized Unet-Transformers in Immersive Environments. International Journal of Computational and Experimental Science and Engineering, 11(4). https://doi.org/10.22399/ijcesen.3821

Issue

Section

Research Article