Deep Learning and Noninvasive Sensors for Detecting Physiological Dysregulation: A Scoping Review

A total of 27 studies were included in this scoping review. These studies were published between 2019 and 2025 and evaluated the application of deep learning algorithms to noninvasive physiological signals for early detection of pain, physiological stress, or hemodynamic deterioration.

China and Australia contributed the highest number of studies (n = 3 each), followed by South Korea, the United States, and Greece (n = 2 each). Countries represented by a single study included India, Taiwan, Egypt, Japan, Colombia, Ecuador, and the United Kingdom. The geographical distribution of the included studies is shown in Fig. 2.

Fig. 2figure 2

Geographical distribution of the included studies. China and Australia had the highest number of included studies (n = 3 each). South Korea had two included studies (n = 2), as did the United States and Greece. Other countries represented by a single study were India, Taiwan, Egypt, Japan, Colombia, Ecuador, and the United Kingdom. Additionally, several articles involved international collaborations, particularly between the United States, Saudi Arabia, and Italy

Regarding study design, experimental studies were the most frequent (n = 11), followed by retrospective studies (n = 9), systematic reviews and meta-analyses (n = 3), observational studies (n = 2), one prospective study, and one Delphi study. This distribution reflects a balance between exploratory approaches and clinical validation efforts. The most commonly used architectures included convolutional neural networks, long short-term memory networks, bidirectional LSTM networks, transformer-based models, and hybrid deep learning systems.

The physiological signals analyzed across studies included EEG, ECG, PPG, capnography, acoustic signals, facial images, and various multimodal combinations integrating two or more physiological or behavioral sources.

As summarized in Table 2, several studies reported high predictive performance. Jeong et al. [2] demonstrated AUROC values of 0.917 in development and 0.833 in external validation using noninvasive inputs such as NIBP, ECG, PPG, and BIS. Wu et al. [3] developed a pain classifier for critically ill patients with an accuracy of 85.9%. Bargshady et al. [7] achieved accuracies above 89% in multiclass pain classification using convolutional and recurrent architectures applied to facial images.

Multimodal approaches showed important advantages over single-signal models. Gutiérrez et al. [10] and Gkikas et al. [6] combined facial and acoustic information and reported improved diagnostic sensitivity for pain and stress detection, with accuracies between 83 and 90% depending on architecture and signal combination. Jian et al. [9] identified physiologic hypotension endotypes using autoencoders and Gaussian mixture modeling with reproducibility across independent cohorts. Jeddah et al. [4] achieved 91.3% sensitivity and 84.9% specificity for detecting respiratory and hemodynamic deterioration through automated analysis of electronic clinical records. Kim et al. [17] and Park et al. [26] demonstrated early warning capabilities for cardiac arrest or delirium hours before clinical detection, supporting more timely intervention in intensive care units.

Most studies focused on adult patients in contexts where subjective evaluation is difficult or impossible, including sedated, anesthetized, or intubated individuals. Signals derived from neural and cardiovascular activity enabled the identification of patterns not detectable through traditional assessments. Adaptive deep learning methods improved accuracy compared with systems based on fixed physiological thresholds, allowing the detection of subtle changes in clinical status.

However, the included studies also highlighted important contextual limitations. In controlled clinical environments, confounding factors such as sedation level, autonomic tone alterations, and mechanical ventilation can modify physiological patterns, potentially reducing the generalizability of trained models. In semi-ambulatory or outpatient settings, signals are influenced by physical activity, environmental noise, and emotional state, requiring different preprocessing and modeling strategies. Recent studies using wearable sensors combined with deep learning demonstrated promising performance in real-world monitoring of pain and stress [10, 17, 20].

The narrative synthesis identified seven thematic categories that organize the included studies: type of sensors used, model architecture, physiological signal analyzed, clinical purpose, implementation environment, methodological quality, and ethical or regulatory considerations. These categories helped map methodological approaches, identify common applications, and highlight main evidence gaps. These thematic categories and their associated references are summarized in Table 3.

Tabla 3 Table of categories

Several methodological limitations were frequently reported. Many studies used small sample sizes, often fewer than 30 participants, particularly in anesthesia or exploratory settings. Small cohorts limit statistical power and increase the risk of overfitting. The absence of external validation datasets further restricts generalizability across diverse populations and clinical conditions [5, 14, 23]. Improvements in dataset size, population diversity, and multicenter designs are necessary to strengthen reproducibility [2, 9, 11, 15].

Semi-supervised learning strategies were increasingly used to improve generalization by leveraging small labeled datasets with larger pools of unlabeled data. Nonetheless, interpretability remains limited. Although some studies integrated explainability methods, their adoption was relatively low [14, 23]. Transparency in deep learning is essential for clinician trust and safe implementation in clinical workflows.

Future research should focus on building multimodal datasets that integrate cardiovascular, neural, and behavioral signals, developing standardized data collection protocols, and expanding open-access repositories. The integration of explainable artificial intelligence methods is essential for transparency and clinical acceptance [3, 9, 15, 18, 20].

Only a small number of studies assessed the clinical impact of these technologies on patient-centered outcomes such as mortality, length of stay, or quality of life. Ethical and regulatory considerations were also seldom addressed. Huo et al. [5], Kim et al. [17], and Fernández Rojas et al. [22] emphasized the need for regulatory frameworks to ensure the safe and equitable adoption of deep learning–based monitoring systems in clinical environments.

Comments (0)

No login
gif