Automatic surgical workflow recognition (SWR) is an integral part of surgical assessment. A surgical procedure can be decomposed into activities at different levels of granularity including phases, steps, tasks, and actions [18]. Phases represent the overarching stages of a surgical procedure (e.g., access, execution of surgical objectives, and closure). Steps break down each phase into specific segments that contribute to the overall procedure (e.g., a nerve-sparing step of radical prostatectomy). Tasks are sub-components of a step (e.g., dissect and clip prostatic pedicles). Action is as specific as an individual motion carried out by a surgeon during each task (e.g., a single cold cut). There has been growing interest in crafting techniques for discerning specific granularity from video data.
Early works for surgical procedure decomposition using classical machine-learning pipelines had limited success. CNN and recurrent neural networks (RNN) have been pivotal in enhancing workflow recognition from surgical videos and modeling spatio-temporal features. Huaulmé et al. [19] utilized CNN, RNN or both for surgical workflow recognition using Micro-Surgical Anastomose Workflow (MISAW) and found accuracy above 95%, 80%, and 60% for phases, steps, and activities, respectively. [19] Ramesh et al. [20] proposed a multi-task, multi-stage temporal convolutional network for SWR, which demonstrated improved results compared to single-task models [20]. More recently, Goodman et al. [21] developed a multitask neural network model for simultaneous, spatiotemporal analysis of hands, tools, and actions in open surgical videos. [21]
Tool usage information is another data source for understanding surgical workflow which is primarily obtained by manual labeling. Sahu et al. [22] developed a RNN to recognize tools in videos and estimate surgical phases [22]. While existing deep learning-based approaches for SWR have shown remarkable results, there is heavy reliance on large-scale labeled datasets which may be time consuming, costly, and subject to the availability of annotators with profound surgical knowledge. To address this, Shi et al. [23] validated a long-range temporal dependency-based active learning on Cholec80 video dataset and outperformed other active learning methods for SWR [23].
Gesture recognitionSurgical gestures or “surgemes” represent the fundamental units of surgical interaction involving instruments and human tissue, such as inserting a needle, pulling a suture, or a single cut of tissue. Automatically recognizing gestures is an important element of automated activity recognition, surgeon skills assessment, surgical training, and autonomous robotic surgery systems. These gestures can serve as objective measures of surgical performance and have been found to impact surgical outcomes [24]. However, the development of automatic gesture recognition poses several challenges due to the intricacy and multi-step nature of gestures.
Gesture recognition methods are classified based on input including video, kinematics data, or both. Classical approaches for automated gesture recognition involve unsupervised learning methods such as the Hidden Markov model. Combined Markov/semi-Markov random field models employed both kinematic and video data. However, these were met with limitations accompanied by the subjectivity of manual feature extraction. Deep neural networks have proven to be a powerful tool for fine-grained surgical gesture recognition.
DiPietro et al. [25] studied gesture and maneuver recognition using RNN and found low error rates for both maneuver and gesture recognition [25]. The group demonstrated impressive accuracy for identification (AUC = 0.88) and classification (AUC = 0.87) of suturing gestures in needle-driving attempts by deep learning computer vision in patients undergoing robot-assisted radical prostatectomy (RARP) [26].
Leveraging both kinematic and video data is an essential part of accurate gesture recognition. Kiyasseh et al. [27] reported 65–97% AUC for surgical dissection gesture classification across institutions and procedures [27].
Intraoperative assessmentWhile robotic surgery has achieved remarkable results across various specialties, it is undeniable that the skill of the operating surgeon plays a crucial role in surgical success. An unbiased and accurate evaluation of surgical performance is increasingly necessary in the era of AI. Traditionally, surgeon performance has been weighed through prior surgical experience or manual evaluation by experienced peers. While widely used, this technique is limited by subjectivity and labor intensity.
In the era of AI, the advent of automated performance metrics (APM) has revolutionized the evaluation of surgical performance. APMs rely on kinematic and video evaluation and serve as objective, actionable and real-time assessment tools. Combining APM with ML can produce objective assessment metrics of surgeon performance. Preliminary studies have demonstrated that APMs can differentiate expert and novice surgeon performance in clinical settings [28]. Juarez-Villalobos et al. [29] accurately classified expert (operative experience > 100 h) and non-expert (operative experience < 10 h) surgeons by time intervals during training in suturing, knot-tying, and needle-passing, three crucial surgical tasks [29]. Wang et al. [29] investigated neural networks for predicting surgical proficiency scores from video clips and achieved excellent performance with scores matching manual performance [29]. Moglia et al. utilized ensemble deep learning neural network models to identify at an early stage the acquisition rates of surgical technical proficiency of trainees. [29]
Hung et al. [31] previously used robotic surgical APMs during RARP and clinicopathological data to accurately predict continence after RARP, achieving a C-index of 0.6 via a DL model (DeepSurv) [31]. The study demonstrated that surgeons with more efficient APMs achieved higher continence rates at 3 and 6 months post-RARP. APMs were ranked higher than clinicopathological features in predicting continence. Hung et al. [32] have also demonstrated the role of ML in APM assessment to predict short-term clinical outcomes. [32]
Schuler et al. [33] utilized robotic kinematic data, surgical gesture data collected from video review, and model-integrated force sensor data in a standardized, simulation-based environment to predict surgical experience, capable of discriminating between surgeons with low or high RARP caseload with very high AUC [33].
Surgical difficulty measurementSurgical difficulty is a multifaceted concept in robotic-assisted surgery that encompasses not only the complexity of the tasks involved but also the cognitive workload placed on surgeons. This cognitive workload is influenced by various factors, such as the lack of tactile feedback, the need for precise communication with assistants, and the operation of multiple instruments within a limited visual field. In the context of minimally invasive surgery, despite the benefits to patients such as less postoperative pain and faster surgical wound site healing times, surgeons face significant challenges due to the physical (e.g., limited surgical field space, difficulty in reaching anatomical structures, and demands of operating on the surgical robot for extended periods of time) and cognitive demands of the procedures. AI models can help in assessing surgical difficulty. Lim et al. conducted a study where they measured physiological response patterns due to changes in workload from primary surgical tasks and multitasking requirements. They developed classification models based on these multimodal physiological signals to distinguish between the primary tasks and multitasking demands and found accuracy up to 79% [34].
Realism in simulationDoctors practicing on simulators before operating on patients is a crucial step that can also provide valuable data. However, the effectiveness of this practice is currently limited by the capabilities of the simulators. Many of them use basic physics, which hinders their ability to model large deformations accurately. As a result, these simulators focus on training surgeons in simplified tasks for agility rather than replicating the complexities of full surgeries. While surgeons can usually generalize the skills learned from these tasks to real clinical settings, the algorithms that aid them are only as effective as the data they receive from these simulators. Therefore, there is a significant need for more realistic simulators. Finite Element Method (FEM) is currently the benchmark for simulating deformation in soft tissue. However, its application in patient modeling is restricted due to challenges in accurately estimating parameters and its computational intensity. Accurate material parameters are crucial for precise FEM simulations. Wu et al. investigated how live data acquired during any robotic endoscopic surgical procedure may be used to correct inaccurate FEM simulation results using an open-source da Vinci Surgical System to probe a soft tissue phantom and replay the interaction in simulation. They trained a neural network to correct for the difference between the predicted mesh position and the measured point cloud and showed improved FEM results by 15–30% over various simulation parameters [35].
Feedback optimizationWhile providing accurate and automated assessment during surgery is extremely important, a further step to enhance surgical education is the generation and delivery of targeted, high-quality intraoperative feedback. Ma et al. [36] first presented a dry lab surgical feedback exercise which showed that audio and visual feedback tailored to a trainee’s specific weaknesses improves robotic suturing skills acquisition [36]. Building on that study, Laca et al. [37] presented another robotic dissection study that used statistical modeling to categorize participants as under or overperformers. They were then given real-time audio and visual feedback which was shown to improve dissection skills in underperformers [37]. Wong et al. [38] have also developed a classification system for feedback delivered to trainees intraoperatively which lays the groundwork for determining what types of feedback are most optimal for surgical training [38]. Together, these studies are setting the stage for AI modeling that can understand a trainee’s weak points and deliver high-quality feedback for each trainee’s specific learning stage (Fig. 4).
Fig. 4Workflow demonstrating ingestion of surgical video and kinematics data, the AI-based generation of intraoperative performance metrics and automated, tailored feedback delivery
Comments (0)