Chitta, S., Thota, S., Manoj Yellepeddi, S., Kumar Reddy, A., & Venkata, A. K. P. (2020). Multimodal Deep Learning: Integrating Vision and Language for Real-World Applications. Asian Journal of Multidisciplinary Research & Review, 1(2), 262-282. https://ajmrr.org/journal/article/view/211