Our proposed framework IMCA-PPG fundamentally shifts the paradigm by leveraging DL and MHCA fusion to extract richer feature representations from PPG signals. Through the transformation of PPG signals into structured image representations, combined with a state-of-the-art ResNet-50 [30] backbone, we demonstrate a superior approach for BP estimation. This section delves into the performance analysis, impact of MHCA fusion, dataset generalizability, compliance with clinical standards, and comparative analysis against existing literature, offering a holistic perspective on the efficacy of our approach.
Table 1 Comparison of SBP and DBP estimation accuracy across features and datasets, benchmarked against AAMI standardsTable 2 Comparison of proposed IMCA-PPG framework with other literatureTable 3 Evaluation of SBP and DBP estimation compliance with BHS standardsA critical aspect of our evaluation is the validation across three heterogeneous datasets—PTT PPG [31], Cabrini [12], and MIMIC-II [32]—each presenting distinct challenges. The PTT PPG dataset, collected under controlled conditions, ensures high signal fidelity, whereas the Cabrini dataset introduces real-world variability through changes in posture, movement artifacts, and externally induced BP fluctuations. On the other hand, MIMIC-II dataset, comprising ICU-admitted patients, exhibits smoother PPG waveforms due to reduced peripheral resistance and diminished diastolic features, as evident from the SBP and DBP distributions in Fig. 3, making BP estimation more challenging. Despite these variations, our framework consistently maintains high performance across datasets, as shown in Table 1. To ensure unbiased evaluation, ResNet-50 was trained separately for each dataset, preventing cross-dataset feature leakage and ensuring that reported metrics reflect dataset-specific learning.
The integration of MHCA further enhances BP estimation by capturing interdependencies among PPG, vPPG, and aPPG images. Unlike conventional feature-based methods, MHCA adaptively emphasizes critical physiological regions within each modality, notably the diastolic peak and notch, which are strongly correlated with BP variations. As presented in Table 1, individual modalities perform suboptimally when analyzed separately. However, MHCA fusion enables the framework to extract complementary patterns from all three signal representations, outperforming PPG, vPPG, and aPPG across multiple evaluation metrics.
For the PTT PPG dataset, MHCA improves upon PPG by 152.63%, 74.02%, and 75.57% in \(R^2\), RMSE, and MAE for SBP, respectively. Compared to aPPG, MHCA achieves gains of 140.00%, 73.63%, and 75.09%. While vPPG attains lower RMSE and MAE for SBP, MHCA consistently provides the best overall performance by achieving the highest \(R^2\) and balanced accuracy across both SBP and DBP estimation. For DBP, MHCA offers improvements of 118.60%, 67.99%, and 70.46% over PPG, and 154.05%, 69.52%, and 71.76% over aPPG in \(R^2\), RMSE, and MAE, respectively. On the Cabrini dataset, the gains are even more substantial: MHCA improves over PPG by 195.83%, 40.52%, and 52.61% for SBP in \(R^2\), RMSE, and MAE, respectively, and by 64.86%, 26.05%, and 34.73% for DBP. For the MIMIC-II dataset, despite the smoother PPG morphology and lower BP variability, MHCA improves over PPG by 253.57%, 90.67%, and 92.88% for SBP, and by 216.28%, 70.75%, and 85.75% for DBP in \(R^2\), RMSE, and MAE, respectively. Notably, vPPG outperforms PPG and aPPG on PTT PPG and Cabrini datasets, where BP variability is externally induced through exercise and posture changes, emphasizing systolic and diastolic fluctuations captured effectively by the first derivative. Conversely, for MIMIC-II, aPPG shows improved performance by accentuating subtle changes in the diastolic notch and peak in smoother PPG waveforms, aligning with observations in Koparır et al. [28], where aPPG achieved the best performance using ResNet-50 across PPG derivatives. All results are reported as mean ± 95% confidence intervals obtained via bootstrap resampling in Table 1, ensuring statistical rigor and reliability.
Fig. 7Bland-Altman plots illustrating the agreement between estimated and reference SBP and DBP values across different feature representations (PPG, vPPG, aPPG, and MHCA) across all datasets. The green dashed line represents the mean bias, while the grey dashed lines indicate the upper and lower LoAs. Shaded areas represent the 95% confidence intervals for the mean bias and the LoA (grey)
These results not only establish the effectiveness of MHCA but also align with findings from Koparır et al. [28], where vPPG outperformed PPG and aPPG. However, while Koparır et al. [28] evaluated their model on the MIMIC-II [32] dataset with limited BP variability due to ICU patient data, our framework demonstrates robust generalization across datasets with externally induced BP fluctuations (e.g., exercise and posture variations). Compared to Koparır et al. [28], our IMCA-PPG model achieves a 47.76% higher \(R^2\) and lower MAE of 87.57% for SBP estimation on MIMIC-II, showcasing substantial improvements in both accuracy and reliability. Furthermore, Heydari et al. [12] proposed a chest-based cuffless BP monitoring system leveraging five distinct PAT definitions derived from PPG and bio-impedance signals under controlled physical activity. While effective, their approach requires additional bio-impedance hardware and multiple signal channels, complicating real-time deployment and increasing hardware dependencies. In contrast, our IMCA-PPG framework operates solely on single-site PPG measurements and generalizes across different datasets and acquisition settings, making it inherently hardware-agnostic and more suitable for scalable, real-world healthcare applications (Table 2).
For any BP monitoring framework to be clinically viable, compliance with established medical standards is essential. As demonstrated in Tables 1 and 3, our IMCA-PPG framework satisfies the stringent AAMI [38] and BHS [39] standards across all datasets. For the PTT PPG dataset, the model achieves a ME of -0.01 ± 0.12 mmHg for SBP and 0.11 ± 0.09 mmHg for DBP, both well within the AAMI \(\le \)5 mmHg requirement. The corresponding SD values are 2.80 ± 0.13 mmHg for SBP and 2.14 ± 0.11 mmHg for DBP, comfortably meeting the \(\le \)8 mmHg threshold. On the Cabrini dataset, ME values of 0.01 ± 0.24 mmHg for SBP and 0.35 ± 0.12 mmHg for DBP, and SD values of 6.48 ± 0.25 mmHg and 4.09 ± 0.21 mmHg, respectively, further confirm compliance. In addition, for the larger and clinically diverse MIMIC-II cohort, IMCA-PPG achieves ME values of -0.05 ± 0.04 mmHg for SBP and 0.04 ± 0.06 mmHg for DBP, with corresponding SDs of 0.82 ± 0.05 mmHg and 1.06 ± 0.06 mmHg, significantly surpassing AAMI thresholds. Evaluation under BHS standards demonstrates that our model consistently achieves Grade ‘A’ classifications. For the PTT PPG dataset, 91.08% of SBP and 96.08% of DBP predictions fall within 5 mmHg, far exceeding the 60% requirement for Grade ‘A’. Similarly, the Cabrini dataset attains 71.57% and 78.33% within 5 mmHg for SBP and DBP, respectively, achieving Grade ‘B’ for SBP and Grade ‘A’ for DBP. On the MIMIC-II dataset, IMCA-PPG achieves an outstanding 99.35% and 99.93% within 5 mmHg for SBP and DBP, respectively, securing a perfect Grade ‘A’ for both measurements.
These results highlight the clinical reliability of our method, making it a strong candidate for integration into practical healthcare applications. By addressing real-world challenges and ensuring generalizability across diverse physiological conditions, IMCA-PPG advances the state of cuffless BP estimation, paving the way for its adoption in everyday health monitoring.
A comparative analysis with prior studies further underscores the advantages of our framework. While conventional approaches have employed various machine learning and feature-based techniques, our model consistently outperforms these methods, as evidenced in Table 2. Prior work often relies on multi-sensor setups, such as ECG and PPG combinations [13, 14, 44] or PPG on two different sites, which increase hardware complexity. In contrast, our single-site PPG-based model outperforms feature-based methods and recent image-based frameworks in terms of \(R^2\), RMSE, and MAE in terms of SBP and DBP estimations on the MIMIC-II dataset, while maintaining a streamlined and hardware-independent configuration. The ability to maintain high accuracy while simplifying hardware requirements represents a major step forward in non-invasive BP monitoring.
Fig. 8Linear fit plots showing the correlation between estimated and reference BP values, highlighting predictive accuracy across feature representations (PPG, vPPG, aPPG, and MHCA) across all datasets
Deep Learning models are often criticized for their black-box nature; however, our framework provides interpretability through ResNet-50 feature visualization. Figure 5 illustrates the key regions identified in PPG, vPPG, and aPPG images that contribute to BP estimation. These feature maps reveal that the model focuses on physiologically relevant variations, aligning with known BP-related signal characteristics. The automated extraction of these features eliminates the need for manual engineering, demonstrating the capability of DL to autonomously learn meaningful representations that enhance predictive accuracy.
To further validate our framework, we conducted Bland–Altman [46] and linear fit analyses. The Bland–Altman plots (Fig. 7) illustrate minimal bias and narrow LoA, confirming that most prediction errors lie within the 95% confidence bounds. The cumulative error distributions reinforce that the majority of SBP and DBP predictions fall within clinically acceptable thresholds. Similarly, the linear fit analysis (Fig. 8) exhibits strong correlation patterns, with high \(R^2\) values, affirming the strong predictive capability of our model across heterogeneous datasets and modalities. These quantitative and visual analyses jointly substantiate the clinical robustness and generalizability of IMCA-PPG.
To assess the computational efficiency of the framework, all experiments were conducted using an NVIDIA A40 GPU with 48 GB VRAM and PyTorch 2.0.1 with CUDA acceleration. The ResNet-50 models used for feature extraction from PPG, vPPG, and aPPG representations each have an on-disk size of 102.8 MB, while the MHCA module introduces an additional 21 MB. On average, the training time for a single ResNet-50 model is approximately 52 minutes (3120 seconds), with an average inference time of 5192 milliseconds per sample. For the complete MHCA module, the average inference time per sample is 30.85 milliseconds, with a total training time of approximately 105 minutes (6300 seconds).
Although the current configuration delivers state-of-the-art performance, further reductions in model size and computational overhead are achievable through post-training optimizations such as pruning and quantization. These techniques preserve predictive accuracy while significantly lowering memory and compute requirements, enabling real-time deployment even in resource-constrained environments such as mobile applications and web servers. Based on existing benchmarks, models compressed to under 50,MB can achieve inference latencies below 30,ms and memory usage under 10,MB on standard CPUs and mobile processors [47]. While our current implementation uses PyTorch, models can be exported to ONNX format to ensure cross-platform compatibility across Android, iOS, Windows, Linux, macOS, and embedded systems. Further compression with ONNX Runtime quantization [48] enhances deployment feasibility for clinical or wearable applications.
Beyond computational efficiency, the proposed framework addresses key limitations of conventional BP monitoring methods. Traditional cuff-based devices provide only intermittent measurements and may cause discomfort during long-term use, particularly in ambulatory or home settings. In contrast, our framework leverages a single-sensor PPG signal, simplifying data acquisition and reducing hardware complexity. Unlike multi-sensor approaches that require synchronized ECG and PPG signals [13, 14, 24, 44], our system enables scalable, low-cost deployment without compromising accuracy. The incorporation of multiple PPG signal derivatives—PPG, vPPG, and aPPG—allows the extraction of complementary physiological features linked to arterial compliance, vascular aging, and peripheral resistance [49]. Feature maps generated by the ResNet-50 models, as shown in Fig. 5, consistently emphasize key waveform regions such as the systolic peak and diastolic notch. These regions are clinically recognized markers of cardiovascular health and arterial stiffness, enhancing the physiological interpretability and trustworthiness of the model’s predictions.
Transitioning from raw waveform-based to image-based BP estimation presents significant advantages. Traditional feature engineering approaches demand extensive domain expertise and signal preprocessing, limiting scalability and adaptability. In contrast, the deep learning-based framework developed in this study autonomously extracts clinically meaningful features, ensuring robustness across diverse sensor configurations. By combining single-site PPG acquisition with multi-modal image representations—PPG, vPPG, and aPPG—the system enhances estimation accuracy without increasing hardware complexity, paving the way for scalable, real-time BP monitoring solutions suitable for integration into commercial wearable devices and smartphone applications. With advancements in smartphone technology—specifically improved camera quality, higher resolution, and increased frame rates—the acquisition of remote PPG signals has become more reliable, enabling accurate, contactless BP estimation. Given its lightweight architecture and robust performance, the proposed framework can be seamlessly adapted for mobile or web-based platforms, facilitating continuous, real-time BP monitoring in non-clinical environments. Such deployment would support preventive healthcare initiatives and chronic disease management. Future work will involve implementing and validating the framework in these real-world settings to assess usability, accuracy, and clinical applicability in real-world settings.
Comments (0)