Vision transformer-based stratification of pre/diabetic and pre/hypertensive patients from retinal photographs for 3PM applications

In this study, we developed and validated ViT-based models to classify the presence and control statuses of diabetes and hypertension, leveraging 145,353 macula-centred retinal photographs from 78,156 participants across four diverse Asian and European cohorts. The models demonstrated promising performance in detecting both conditions and stratifying their control statuses, supporting the hypothesis that ViT-based models applied to retinal imaging can potentially aid diabetes and hypertension management, in line with the 3PM framework. However, the models showed limited ability to predict continuous clinical parameters such as HbA1c and blood pressure. These findings partially support our hypotheses, affirming the potential of ViT-based retinal imaging for classifying diabetes and hypertension and stratifying their control statuses, while emphasising the need for further refinement to enhance its predictive power for continuous biomarkers and its broader clinical applicability.

Unmet patient needs in reactive medicine

The traditional reactive approach to medicine often fails to address the needs of individuals with undiagnosed or poorly controlled diabetes and hypertension, as interventions typically occur only after onset, often when complications have already developed [8, 9]. Many affected individuals remain unaware of their condition, due to the asymptomatic nature of the early stages of these conditions, exacerbating long-term health risks [3, 4]. This gap in early detection highlights a critical unmet patient need in reactive medicine.

Additionally, therapeutic inertia remains a major barrier in reactive medicine. Many patients with diabetes or hypertension continue with suboptimal treatment regimens due to a lack of timely reassessments or clinical uncertainty, leading to prolonged periods of poor disease control [42, 43]. This inertia contributes to avoidable disease progression and increased risk of complications, jeopardising patient outcomes.

3PM innovation to improve individual outcomes

The study highlights the potential for these models to assist clinical decision-making and addressing unmet patient needs in the reactive approach to medicine. Their predictive utility lies in individualised patient profiling [9]—identifying individuals with undiagnosed or poorly controlled conditions, as well as those at an elevated risk of progression, such as those with pre-diabetes and pre-hypertension. By facilitating earlier detection of these individuals, the models ensure timely referrals and targeted interventions, integral to the preventive approach within 3PM. This proactive management can potentially help to curb diabetes and hypertension progression and reduce the risk of complications, ultimately improving patient outcomes.

At the same time, by providing insights to an individual’s control status, the models enable healthcare providers to facilitate personalised treatment alterations. This tailored care ensures that treatment plans are aligned with the patient’s unique needs, ultimately leading to better outcomes and sustained health improvements. Furthermore, recognising well-controlled cases helps optimise resource allocation, preventing over-diagnosis and ensuring that healthcare efforts are directed towards individuals who require urgent care [8].

In primary care settings, where retinal imaging is increasingly available, these models further facilitate the opportunistic detection of diabetes and hypertension, along with their control statuses, addressing a key unmet need in reactive medicine. Unlike the traditional approach, where interventions often occur only after disease onset, often after complications have already developed due to the asymptomatic nature of these conditions in their early stages [8, 9], these models ensure earlier intervention. By integrating these models into routine clinical practice, healthcare delivery can be more efficient, patient management can be streamlined, and outcomes can be significantly improved. This shift represents a substantial move away from reactive approaches, paving the way for more proactive and effective healthcare strategies.

Comparative performance of proposed ViT-based models against existing literature

Our study explores ViT-based models as a promising approach for retinal image-based detection of diabetes and hypertension, demonstrating competitive performance with established deep learning architectures. For diabetes, our model achieved an internal AUROC of 0.820 (0.805–0.835), outperforming a ResNet-18 model (AUROC: 0.731 [0.707–0.756]) that lacked external validation [20]. However, it fell short compared to a ResNet-50 model, which reported AUROCs ranging from 0.788 to 0.932 across internal and two external test sets [21]. Notably, the latter study was conducted exclusively on Chinese cohorts, potentially inflating performance due to limited generalisability. To our knowledge, no prior studies have reported transformer-based models for diabetes detection.

For hypertension detection, our ViT-based model achieved an internal AUROC of 0.781 (0.772–0.790), outperforming both an Inception-v3 model (AUROC, 0.766) [22] and a transformer-based RETFound model (AUROC, 0.690 [0.657–0.724]) [26]. Neither of these studies conducted external validation, highlighting the added value of our approach in testing across diverse datasets.

Across both disease classifications, the models consistently demonstrated the highest AUROC in detecting poorly controlled cases (compared to non-diseased individuals). This could be attributed to the more pronounced retinal features associated with poorly controlled disease states, which facilitate stronger model discrimination. This stratification is particularly valuable for identifying high-risk patients who require urgent intervention and supports targeted clinical interventions to mitigate disease progression.

In regression analysis for the relevant systemic biomarkers (HbA1c, SBP, and DBP), our models consistently yielded low R2 values (Supplementary Table 4). This contrasts with the relatively promising AUROC values observed in some classification tasks for diabetes and hypertension outcomes, suggesting that the models may lack critical predictors or interactions influencing biomarker levels. Moreover, these biomarkers are inherently variable due to factors like acute stress or illness [44], or abrupt dietary changes (e.g. high-sugar or salt intake), further complicating accurate regression modelling. Nonetheless, previous studies support some findings; for instance, HbA1c has been reported with MAEs of 0.33 and 1.39 and corresponding R2 values of 0.13 and 0.09 [17, 45], which show some overlap with our observed MAE range of 0.386–1.856 and R2 range of 0.067–0.264 across internal and external test sets. However, for SBP, our MAEs (14.119–18.533) and R2 values (0.071–0.294) were less favourable than those reported in prior studies (MAEs of 9.29 and 11.35, R2 of 0.31 and 0.36). Similarly, for DBP, our MAEs (8.092–14.591) and R2 values (0.086–0.126) fell short of the reported benchmarks (MAEs of 6.42 and 7.20, R2 of 0.32 and 0.35) [17, 45]. The discrepancies between our study’s findings and those of previous studies may stem from factors such as population differences, imaging protocols, feature extraction methodologies, or proportional bias, as suggested by Bland–Altman plots (Supplementary Fig. 2).

The Bland–Altman plots (Supplementary Fig. 2) reveal proportional bias, with prediction errors increasing alongside the magnitude of predicted and actual values averages [46]. This trend suggests that our models tend to overestimate higher values while performing more accurately in predicting lower values across all three biomarkers. For HbA1c, this bias may be attributable to right-skewed distribution of the training data (skewness = 4.06), which includes a higher density of observations in the lower ranges and sparse representation in higher ranges (Supplementary Fig. 2). However, for SBP and DBP, where the skewness is less pronounced, this effect may reflect the inherent variability of these biomarkers.

The saliency maps show that our models focus on major retinal vessels for diabetes and hypertension classification, which may explain their limited sensitivity to the microvascular changes characteristic of these conditions (Supplementary Figs. 35). In contrast to the scattered regions of interest reported by Zhang et al. (2021) for diabetes detection [21], our findings emphasise the significance of the retinal vasculature, suggesting potential differences in model training mechanisms. Conversely, our models’ attention on vascular features and the optic disk for hypertension detection aligns with existing literature by Rim et al. (2020) [17] and Poplin et al. (2018) [45]. Future research could explore incorporating annotations of key signs to guide the models’ focus towards the relevant microvascular areas, potentially improving their sensitivity to the smaller blood vessels affected by diabetes and hypertension.

Challenges in model performance and generalisability for personalised and preventive care

While the models showed promise in detecting the presence and control statuses (compared to non-diseased individuals) of diabetes and hypertension, they faced limitations in differentiating between poorly controlled and well-controlled cases, as well as in identifying pre-hypertension or pre-diabetes relative to healthy individuals. These challenges likely stem from the subtle retinal differences between these groups, making nuanced distinctions difficult, particularly when near clinical parametric thresholds. Moreover, the smaller sample sizes in the poorly controlled and well-controlled categories contributed to the widest 95% confidence interval observed, complicating reliable predictions. Refining these models is crucial to enhancing risk stratification and facilitating timely, targeted interventions.

Furthermore, the models exhibited varying degrees of generalisability across datasets. Stronger performance on the SP2 dataset was observed, likely due to its similarities with the SEED cohort in imaging protocol and demographics, as both are Singapore-based multiethnic population studies. In contrast, the BES dataset consistently achieved lower AUROC values, potentially due to differences in camera models and its exclusive focus on a Chinese population, which lacks the ethnic diversity present in the UKBB and SEED training datasets (Table 1). Notably, generalisation was more consistent for hypertension than for diabetes outcomes, suggesting that retinal biomarkers related for hypertension may be more stable across populations, whereas diabetes-related retinal changes exhibit greater variability.

Limitations and outlook in the context of 3PM

Our study has several limitations. First, the exclusion of “ungradable” images through our quality assessment algorithm (Supplementary Material 1) was intended to minimise the risk of misclassification. However, this decision should be considered carefully when deploying these models in clinical settings, where lower-quality images may be encountered. Secondly, the ethnic imbalance between Asian and European cohorts in the training set may have influenced the performance of external test sets, which predominantly comprised Asian participants. Future research should aim for a more balanced cohort representation in external test sets to facilitate a fair assessment of the model’s generalisability.

Beyond these limitations, an important avenue for future research lies in enhancing the predictive utility of these models to encompass incidence prediction. While our models effectively classify the presence and control statuses of diabetes and hypertension, they do not yet forecast which individuals with pre-diabetes or pre-hypertension are at the highest risk of progression to disease onset. In this context, the potential integration of our models with emerging non-invasive biomarker profiling such as cell-free nucleic acids [47], metabolomics [48], and tear-based diagnostics [49,50,51,52] offers promising potential for multi-modal risk stratification. These technologies may provide molecular-level insights into individual disease trajectories and could potentially enhance the sensitivity and specificity of pre-disease detection. Advancing this predictive capability is crucial for strengthening the 3PM paradigm, enabling a shift from early detection to proactive, personalised prevention through tailored risk-based interventions.

Comments (0)

No login
gif