A screening gap, not a modeling contest
Diabetic retinopathy is a leading cause of preventable blindness, and it is detectable early from a retinal photograph. The bottleneck is not imaging — it is the shortage of ophthalmologists to read those images, especially in high-volume, underserved clinics where referable disease is caught late.
Our goal was a decision-support tool that brings specialist-grade triage to the point of care: grade a fundus image in seconds, flag patients who need referral, and show clinicians why.
The model
We fine-tune an EfficientNet-B3 backbone (ImageNet-pretrained) on the APTOS 2019 dataset of clinically graded fundus images. The task is a five-class ordinal grading aligned to the clinical scale:
- 0 — No DR
- 1 — Mild
- 2 — Moderate
- 3 — Severe
- 4 — Proliferative DR
Optimizing the right metric
Accuracy is the wrong target here. The grades are ordinal — predicting grade 3 when the truth is grade 4 is a smaller error than predicting grade 0 — and the classes are imbalanced. So we optimize and report Quadratic Weighted Kappa (QWK), which penalizes predictions in proportion to how far they fall from the true grade.
We also deliberately bias toward sensitivity. In screening, a missed sick patient is far costlier than a false alarm that gets a second look, so the operating point is tuned to minimize false negatives on referable disease.
Making it trustworthy and explainable
A grade with no justification is hard for a clinician to act on. Every prediction ships with a Grad-CAM overlay that highlights the retinal regions driving the decision, so a reviewer can confirm the model is attending to genuine pathology rather than artifacts.
We standardize inputs with Ben-Graham color normalization and apply test-time augmentation for more stable predictions. The model runs on commodity CPU hardware, so it can be deployed in clinics without specialized infrastructure.
Responsible deployment
This is a screening and decision-support tool, not a diagnostic device. It is designed to operate under clinician oversight and would require appropriate regulatory clearance (such as FDA or CE marking) before clinical use. The aim is to extend the reach of scarce specialist time — not to replace the specialist.