Harris, Sheetal, Thong Ta, Vinh, Trovati, Marcello
ORCID: 0000-0001-6607-422X, Nakhla, Ghada, Latif, Faiza and Korkontzelos, Ioannis
(2026)
Multimodal misinformation detection across diverse languages using RAG and LLMs.
Journal of Intelligent Information Systems
.
ISSN 0925-9902
Preview |
PDF (VOR)
- Published Version
Available under License Creative Commons Attribution. 3MB |
Official URL: https://doi.org/10.1007/s10844-026-01042-x
Abstract
The rapid spread of multimodal fake news (FN) on Online Social Networks (OSNs) threatens digital information ecosystems, particularly in low-resource languages. Existing multimodal fake news detection (FND) methods are largely limited to high-resource settings, restricting their global applicability. We propose an M&M-RAG, a Multilingual & Multimodal Retrieval-Augmented Generation framework, that leverages Large Vision-Language Models (LVLMs) and Large Language Models (LLMs) to verify news claims across English, Chinese and Urdu. M&M-RAG integrates real-time multilingual evidence retrieval, language-aware prompting, and cross-modal reasoning for fact verification. We further propose Multi-Ax-to-Grind Urdu, the first large-scale, multi-domain multimodal benchmark for FND in Urdu. Experiments on typologically diverse monolingual multimodal datasets demonstrate that M&M-RAG achieves state-of-the-art (SOTA) performance, with 94.6% accuracy and 94.2% F1 score, surpassing models such as SpotFake, MPFN, MMCFND, and Semi-FND. The proposed framework remains robust in zero-shot and cross-lingual scenarios under frozen-model inference without task-specific fine-tuning. The results underscore the scalability and interpretability of LVLM-based approaches for combating multimodal misinformation, particularly in under-represented and typologically diverse languages.
Repository Staff Only: item control page
Lists
Lists