The U.S. Food & Drug Administration has authorized the system for use by health professionals for the autonomous detection of diabetic retinopathy (>mild NPDR and/or DME), which is the first ever FDAauthorized AI diagnostic system in any field of medicine.
The convergence of major developments in artificial intelligence (AI) for image analysis with advances in clinical imaging technologies has major implications for the practice of medicine. Gains in AI system performance have been the product of improvements in computing hardware and progress in algorithm design, such that large volumes of data can now be processed with great accuracy at extraordinary speeds. As Hogarty et al. illustrate in this edition of the Journal, the discipline of ophthalmology is at the forefront of the AI revolution, with a growing body of research indicating that AI systems can be applied to a wide range of ophthalmic imaging methods across a broad range of disease categories with remarkable performance. As a testament to rapid advancements in this field, several important publications have emerged since Hogarty et al. undertook their literature search. Grassmann et al. developed a deep learning algorithm on 86 770 manually graded colour fundus images from the Age-Related Eye Disease Study (AREDS) database and achieved a high overall accuracy for the classification of age-related macular degeneration (AMD) in an independent validation dataset (overall accuracy for early or late AMD = 84.2%). In addition, Li et al. reported the successful application of a deep learning algorithm for discrimination of colour fundus photographs with referable glaucomatous optic neuropathy (suspect or certain) from non-referable (unlikely) photos. Another recent publication with profound implications for the implementation of AI systems in the clinic describes the evaluation of a deep learning system for the detection of referable diabetic retinopathy (>mild NPDR and/or DME). The trial included 900 participants with diabetes who were recruited from 10 primary care practices in the United States. Participants first underwent conventional two-field retinal photography (non-mydriatic unless an AI image quality assessment system deemed mydriasis necessary), and these images underwent AI system grading. Participants then underwent mydriatic wide-field stereoscopic fundus photography and macular OCT imaging for expert human grading to establish the ground truth for each participant, against which AI system performance was measured. Performance of the AI system exceeded predetermined standards (sensitivity = 87.2% [>85%]; specificity = 90.7% [>82.5%]). On the basis of these findings, the U.S. Food & Drug Administration (FDA) has authorized the system for use by health professionals for the autonomous detection of diabetic retinopathy (>mild NPDR) and DME. This landmark approval is the first ever FDAauthorized AI diagnostic system in any field of medicine. Arguably the most significant breakthrough in the field to date has arisen from a collaboration between DeepMind, leaders in the development of AI systems and a team of clinicians and academics from Moorfields Eye Hospital and University College London. An AI system was trained on 14 884 OCT scans to detect more than 50 common retinal diagnoses, representing a wide variety of conditions affecting patients attending a tertiary referral eye hospital. An independent sample of images from 997 patients with gold-standard labels based on an expert panel decision, using actual clinical outcome data, formed the validation dataset. AI system performance on the validation dataset was compared against four retinal specialists and four optometrists from Moorfields Eye Hospital. Of note, the AI classification was based on the OCT scan alone, while human graders had access to OCT scans, fundus images and medical history data. The AI system demonstrated performance reaching or exceeding that of human graders (error rate: AI = 5.5% vs. experts, range = 5.5–13.1%) for a challenging multiclass decision problem (four referral categories: urgent, semi-urgent, routine and observation only). Importantly, system performance was robust when images from two different OCT devices were used. This study indicates sub-specialist expert-level performance of an OCT AI system with clinically relevant stratification of referral urgency. Despite enormous promise, the potential of this technology must be viewed in light of its application in real-world clinical settings. Until recently, the basis for the AI system classification of a given image has been inscrutable to clinicians. Several