A major new milestone in ophthalmology and artificial intelligence was announced with the release of LMOD+, a large-scale, publicly shared dataset and benchmark designed to help train and test multimodal AI systems in eye care [1].
The work, led by a global team of researchers, is designed to accelerate development of tools that can assist clinicians in diagnosing, staging, and understanding eye diseases using medical images plus related clinical data.
LMOD+ comprises 32,633 annotated instances, incorporating five ophthalmic imaging modalities such as fundus photos, OCT scans, and lens photographs. Each instance includes not only image data, but also structured information like anatomical annotations, disease labels, severity staging, and even demographic metadata. The dataset spans 12 common eye conditions, ranging from diabetic retinopathy and age-related macular degeneration to retinal vein occlusion.
Compared to its predecessor (LMOD), the new version expands the size by nearly 50%, with a particular enlargement of color fundus photography—a modality that is widely available even in lower-resource settings. The expanded dataset supports a wider variety of tasks, including:
-
Disease screening (binary and multi-class diagnosis)
-
Disease severity classification (staging using internationally adopted grading schemes)
-
Anatomical structure recognition
-
Demographic attribute prediction (e.g. age and sex) — enabling assessment of potential model bias.
To test whether existing multimodal large language models (MLLMs)—systems that combine image analysis with language-based reasoning—can handle ophthalmology tasks, the authors evaluated 24 state-of-the-art models. Under “zero-shot” conditions (i.e. without disease-specific fine-tuning), the top-performing models reached ~58% accuracy in disease screening—which, while promising, remains far below the level required for clinical deployment. For more complex tasks like disease staging, performance fell considerably, often near random.
The researchers say while general-purpose AI systems are rapidly advancing, medical specialization—especially in a complex field like ophthalmology—still poses significant challenges. Domain-specific datasets and careful validation remain essential.
Reference
1. Qin, Z. et al. “LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology.” arXiv (submitted 30 Sep 2025).