GenSL-Trans: Direct Visual-to-Visual Arabic-to-English Sign Language Translation via Mobile-Optimized Unet-Transformers in Immersive Environments

Hadj Ahmed BOUARARA; Kadda Benyahia; Mohamed Elhadi Rahmani

doi:10.22399/ijcesen.3821

Authors

Hadj Ahmed BOUARARA GeCoDe Laboratory, Tahar Moulay University, Saida , Algeria https://orcid.org/0000-0002-4973-4385
Kadda Benyahia GeCode LABORATORY , Dr Tahar Moulay University of Saida https://orcid.org/0000-0002-6394-0855
Mohamed Elhadi Rahmani GeCoDe Laboratory, Tahar Moulay University, Saida , Algeria https://orcid.org/0000-0001-5924-9888

DOI:

https://doi.org/10.22399/ijcesen.3821

Keywords:

Mobile Interactive, Direct Sign Language Translation, Metaverse, Transformer, GPT, BERT, Generative AI

Abstract

We propose a real-time, mobile-interactive pipeline for direct Arabic-to-English Sign Language (ArSL-to-ESL) translation in the metaverse, preserving the visual-spatial nature of sign languages without textual intermediaries. Central to this system is a newly created bilingual mapping dataset between Arabic and English sign language, which enables accurate cross-lingual alignment of gestural patterns and forms the foundation for direct, grammar-preserving translation. The system captures gestures via VR headsets or smartphone cameras at 90 fps (1080p, H.264), with on-device preprocessing (OpenCV) optimized via NNAPI or Core ML. A quantized YOLOv11 (int8) model with Kalman tracking achieves 92% accuracy on the mapping dataset with <11 ms inference on mobile GPUs. Visual features are encoded via 14×14 patch embedding into 256D tokens and processed by GenSL-Trans a lightweight (14M params) vision Transformer (8 heads, FFN=1024) to map sign gestures directly to target ESL representations. The Bi-LSTM, BERT, and GPT-2 decoders generate spatiotemporal sequences with adaptive on-device/cloud execution. A CNN-based renderer with Conv2DT layers and U-Net skips produces 224×224 px video frames, driving a lightweight 3D avatar streamed via glTF and rendered in real time using WebXR, accessible on mobile browsers (iOS/Android) or VR headsets, with end-to-end latency <180 ms. Mobile interactivity allows touch-based control (start/stop, speed, expressions, feedback), ensuring accessibility and personalization. By integrating on-device AI, direct gesture-to-gesture translation, and immersive rendering, our system provides an inclusive communication bridge for Deaf users across Arabic- and English-speaking communities.

References

[1] M. Alamri and S. Lajmi, "Design a smart platform translating Arabic sign language to English language," Int. J. Power Electron. Drive Syst., vol. 14, no. 4, pp. 4759–4774, Dec. 2024, doi: 10.11591/ijece.v14i4.pp4759-4774.

[2] E. K. Elsayed and D. R. Fathy, "Sign language semantic translation system using ontology and deep learning," Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 1, pp. 137–144, 2020, doi: 10.14569/IJACSA.2020.0110118.

[3] N. Aouiti and M. Jemni, "Translation system from Arabic text to Arabic sign language," J. Arabic Islamic Stud., vol. 3, no. 2, pp. 57–70, 2018, doi: 10.33633/JAIS.V3I2.2041.

[4] A. A. Alethary, A. H. Aliwy, and N. S. Ali, "Automated Arabic-Arabic sign language translation system based on 3D avatar technology," Int. J. Adv. Appl. Sci., vol. 11, no. 4, pp. 383–396, Dec. 2022, doi: 10.11591/ijaas.v11.i4.pp383-396.

[5] A. M. Almasoud and H. S. Al-Khalifa, "A proposed semantic machine translation system for translating Arabic text to Arabic sign language," in Proc. 5th Int. Conf. Pervasive Technol. Related to Assistive Environ., Heraklion, Greece, Jun. 2011, pp. 1–8, doi: 10.1145/2107556.2107579.

[6] A. Boukdir, M. Benaddy, A. Ellahyani, O. E. Meslouhi, and M. Kardouchi, "Isolated video-based Arabic sign language recognition using convolutional and recursive neural networks," Arabian J. Sci. Eng., vol. 47, no. 2, pp. 2187–2199, 2022. doi: 10.1007/s13369-021-05979-8.

[7] R. S. Abdul Ameer, M. A. Ahmed, Z. T. Al-Qaysi, M. M. Salih, and M. L. Shuwandy, "Empowering communication: A deep learning framework for Arabic sign language recognition with an attention mechanism," Computers, vol. 13, no. 6, p. 153, Jun. 2024. doi: 10.3390/computers13060153.

[8] K. M. Nahar, A. Almomani, N. Shatnawi, and M. Alauthman, "A robust model for translating Arabic sign language into spoken Arabic using deep learning," Intell. Autom. Soft Comput., vol. 37, no. 3, pp. 2037–2057, 2023. doi: 10.32604/iasc.2023.038175.

[9] S. Hayani, M. Benaddy, O. El Meslouhi, and M. Kardouchi, "Arab sign language recognition with convolutional neural networks," in Proc. 2019 Int. Conf. Comput. Sci. Renew. Energy (ICCSRE), Marrakech, Morocco, Jul. 2019, pp. 1–4. doi: 10.1109/ICCSRE47301.2019.8963530.

[10] B. Zhou, Z. Chen, A. Clapés, J. Wan, Y. Liang, S. Escalera, and D. Zhang, "Gloss-free sign language translation: Improving from visual-language pretraining," in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 20 871–20 881. doi: 10.1109/ICCV51070.2023.01925.

[11] H. Zhou, W. Zhou, W. Qi, J. Pu, and H. Li, "Improving sign language translation with monolingual data by sign back-translation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 1316–1325. doi: 10.1109/CVPR51954.2021.00136.

[12] 1E. Mahmoud, K. Wassif, and H. Bayomi, "Transfer learning and recurrent neural networks for automatic Arabic sign language recognition," in Adv. Mach. Learn. Technol. Appl., ser. AISC, vol. 1490, A. E. Hassanien et al., Eds. Springer, Cham, 2022, pp. 47–59. doi: 10.1007/978-3-030-95065-2_5.

[13] Y. Saleh and G. Issa, "Arabic sign language recognition through deep neural networks fine-tuning," in Proc. Int. Conf. Adv. Intell. Syst. Signal Process. (AISSP), 2020, pp. 45–50. [Online]. Available: https://www.researchgate.net/publication/344347972

[14] L. Gao, W. Feng, P. Shi, R. Han, D. Lin, and L. Wan, "Sign language translation with hierarchical memorized context in question answering scenarios," Neural Comput. Appl., 2024. doi: 10.1007/s00521-024-10042-5.

[15] G. Latif, J. Alghazo, N. Mohammad, R. AlKhalaf, and R. AlKhalaf, "Arabic Alphabets Sign Language Dataset (ArASL)," Mendeley Data, v. 1, 2018. [Online]. Available: https://doi.org/10.17632/y7pckrw6z2.1

[16] G. Latif et al., "ArSL21L – Arabic Sign Language Letter Dataset," Mendeley Data, 2024. [Online]. Available: https://data.mendeley.com/datasets/f63xhm286w/1

[17] J. Béres, L. Makra, and A. Gulyás, "YOLOv3-based real-time sign language hand gesture detection," in Proc. 12th IEEE Int. Conf. on Computational Cybernetics (ICCC), Hungary, 2020, pp. 1–6. doi: 10.1109/ICCC49849.2020.9252075.

[18] R. Ameer, M. A. Ahmed, Z. Al-Qaysi, M. Salih, and M. Shuwandy, "Empowering Communication: A Deep Learning Framework for Arabic Sign Language Recognition with an Attention Mechanism," Computers, vol. 13, no. 6, p. 153, 2024. doi: 10.3390/computers13060153.

[19] A. B. H. Amor, O. El Ghoul, and M. Jemni, "An EMG dataset for Arabic sign language alphabet letters and numbers," Data in Brief, vol. 51, p. 109770, 2023, doi: 10.1016/j.dib.2023.109770 .

GenSL-Trans: Direct Visual-to-Visual Arabic-to-English Sign Language Translation via Mobile-Optimized Unet-Transformers in Immersive Environments

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Announcements

Fake Journal warning

Keywords

Announcements

Current Issue