A systematic review of EEG-based machine learning classifications for obsessive-compulsive disorder: current status and future directions

In this systematic review, we aimed to review studies that have been attempted to classify OCD compared to healthy or other psychiatric and neurological conditions using EEG and with an application of machine learning and deep learning techniques. In our review, we witnessed that studies usually do not focus on one or two disorders but focus on large categories of disorders. It is suggested that e.g., journals, as topic planners for special issues, should encourage topics with more clinical use while respecting basic research, like studies that are more interpretable, not just research with higher accuracy or more disorders. These types of reports, only with limited information provided lead to misundersatidng and misinterpretations given the complexity if the disorder and associated morbidities. In recent years, with an advancement of AI, more researchers are interested to distinguish neuropsychiatric conditions against healthy condition and to further develop precision medicine and individualise treatments. Although the small number of studies and the high heterogeneity of extracted data limit quantitative analysis, this variability highlights a critical issue: the lack of internationally standardized research designs and methodological perspectives to efficiently investigate the differences in detection of mental disorders and to predict the onset or monitirng the treatment such as phramocotherapy. The absence of such standards hampers the reproducibility of findings. In this section, we first discuss the current state of the field based on the reviewed studies, followed by future directions where we propose a practical framework. This framework aims to address existing challenges, improve study repeatability, and ultimately enable meaningful meta-analyses.

Current statusAccuracy paradox and high heterogeneity among the included studies

While high accuracy in a computational model would first be thought of as an achievement, the problem is that extremely high accuracy levels can occur from overfitting or from an oversmall sample. In the study by Mukherjee et al. (2024), the PsyNet model with Autoencoder + PLSTM architecture was 100% accurate, but since the sample size was very low (only 46 participants) and there was no external validation, the genralisability of results must be dealt with cautious. In studies using external validation — for example, Farhad et al., 2024 — the accuracy decreased from 90.88% in cross-validation to 75% in external validation, clearly showing that how accuracy was reported had a great impact on the percentage reported. Future work, therefore, should focus on comparing various models and evaluate the extent to which higher accuracies are observed, while also achieving reproducibility, uniform reporting of accuracy, representative sample size, and transparency of method should be considered in order to provide the solid groundwork for progressing EEG-ML as useful clinical tools. More specifically, considering that OCD is a heterogenous neuropsychiatric disorder with other morbidities or variety of symptoms severity, require more indepth investigations by including demographic information of patients and aged matched healthy individuals, medication stutus, years of illness, types of treatments and behavioral and cognitive components associated with both healthy and OCD patients. These considerations would better contribute to our understanding about the disorder and neurobehavioral and cognitive dimensions impaired due to the symptoms and the history of individuals. This further enables clinicians to develop an evidenced-based therapeutic models for the OCD patients depending on their age and with higher precision. A key contributor to the high heterogeneity observed across EEG-based machine learning studies in OCD is the significant variation in clinical, methodological, and technical approaches. Clinically, participant characteristics differ widely between studies. These include variations in symptom severity, diagnostic procedures, comorbidity profiles, and treatment history. While some studies used structured diagnostic interviews such as the MINI or Y-BOCS, others relied on broader psychiatric evaluations or failed to specify their diagnostic criteria. Treatment status also varied: some studies included drug-naïve participants, whereas others examined individuals undergoing pharmacological interventions such as SSRIs or antipsychotics—factors known to influence neural activity and EEG signatures.

From a technical perspective, there is a lack of standardization in EEG acquisition and processing protocols. Studies differed in their choice of electrode montages, frequency bands of interest, filtering techniques, and resting-state conditions (e.g., eyes open vs. eyes closed). These discrepancies inevitably affect the recorded signal characteristics and complicate cross-study comparisons. Additionally, preprocessing pipelines and feature extraction strategies were inconsistent, ranging from handcrafted spectral features to deep learning-based representations. Machine learning models also varied substantially—not only in algorithmic choice (e.g., SVM, ANN, LSTM, CNN-GRU, XGBoost), but also in how model performance was evaluated (e.g., accuracy, F1 score, AUC, balanced accuracy), and whether the evaluation used cross-validation, external validation, or subgroup analysis.

Performance outcomes reflect this diversity. For instance, recent studies such as Mukherjee et al. (2024) and Ahmed et al. (2024) reported remarkably high classification accuracies (up to 100% and 96.83%, respectively), while others like Gour et al. (2023) and Erguzel et al. (2015) reported more modest results (e.g., 63.21% and 56.96%). These discrepancies cannot be solely attributed to model quality but also reflect underlying variability in dataset composition, preprocessing rigor, and evaluation standards. Studies with external validation, such as Farhad et al. (2024), often report lower accuracies (e.g., 75.00%), underscoring the challenges of generalizability in real-world settings.

Overall, the primary source of heterogeneity seems to lie in the lack of standardization in both clinical characterization and EEG processing workflows. Without consistent approaches to defining OCD subtypes, accounting for comorbidities, or applying harmonized preprocessing and modeling techniques, it remains difficult to make meaningful comparisons or draw robust conclusions across studies. Future research should prioritize the adoption of standardized protocols, greater transparency in reporting methodological details, and inclusion of metadata related to clinical profiles and signal processing steps. These measures are critical for improving reproducibility and advancing the clinical utility of EEG-based models in OCD classification.”

Continued preference for traditional models and trends in deep learning and hybrid architectures

Despite the increasing attention toward modern machine learning techniques, traditional models still dominate EEG-based classification studies. This ongoing reliance can be attributed to several practical reasons: classical models such as SVM, and tree-based algorithms are computationally efficient, require less training data, and are more familiar to researchers. SVMs, in particular, have remained widely used due to their ability to handle high-dimensional feature spaces and employ nonlinear kernels for detecting complex patterns. Similarly, tree-based models have remained popular because they offer built-in dimensionality reduction through ensemble strategies like bagging, while also providing interpretable outputs through feature importance rankings.

While deep learning models were less frequently used in earlier studies, there has been a noticeable increase in their adoption in more recent literature (2023–2025). This shift reflects a gradual movement toward more sophisticated modeling techniques that can better capture the dynamic and high-dimensional nature of EEG signals. Among these, LSTM networks and their variants such as Bi-directional LSTM (Bi-LSTM) have emerged as the most commonly applied architectures, representing 30.8% of all deep learning approaches observed. These models are particularly well-suited for EEG data, as they are designed to model sequential dependencies and preserve important temporal information over time.

The suitability of LSTM-based models is especially relevant in the context of OCD research, where datasets tend to be relatively small. Unlike many deep learning architectures that demand large-scale data, LSTM models have shown greater resilience in smaller datasets, making them a practical and powerful tool for identifying temporal dynamics and subtle neurophysiological patterns often missed by traditional approaches. Their increasing usage signals a growing confidence in deep learning’s potential to enhance precision and interpretability in EEG-based OCD classification. Importantly, LSTMs are also more robust when dealing with small datasets—a recurring challenge in OCD-related EEG research. In machine learning, a common rule of thumb suggests that at least 10 samples per feature are needed to avoid overfitting. Given that EEG studies typically extract hundreds of features—e.g., from 19 electrodes across 5 frequency bands—a dataset with fewer than 100 participants is at high risk of overfitting. Yet, only seven OCD studies in this domain explicitly report sample sizes, and none include more than 95 OCD patients. This further justifies the use of LSTM-based models, which tend to perform better under such constraints compared to other deep learning architectures. As datasets grow and become more standardized, this trend toward deep learning is likely to accelerate.

Additionally, choosing between using raw signals or extracted features for EEG-ML in OCD is a complex problem. To our knowledge, it is not possible to say clearly if raw signals or extracted features are better. Raw signals, especially when given to LSTM or Transformer, may help models understand time series, sequences, and nonlinear features but may also include more noise. Using extracted features can reduce noise but hide time sequence data. We think both views are valuable, but it is recommended that if using extracted features, high beta and gamma bands should be included, because research has shown these bands are valuable for OCD diagnosis [10, 37,38,39]. Studies not analyzing these bands may miss key information, especially changes in e.g., associated with clinicl symptoms, monitoring medication effect and associated improvement.

Future directions: building toward reproducibility and clinical translation

To move EEG-based OCD research closer to clinical applications, future efforts must focus on consistency, transparency, and broader population relevance. One major challenge in this field is the lack of standardization, which makes it difficult to compare findings across studies or replicate results. Without consistency in methods and clarity in interpretation, promising discoveries may struggle to translate into clinical practice.

A notable observation from current literature is the geographic concentration of studies, with a majority carried out in Asian populations. While valuable, this regional focus may limit how well the findings apply to more diverse global populations. Expanding the use of publicly accessible EEG datasets that include a broader cultural and demographic range could help overcome this limitation.

Establishing a large-scale, open-access EEG repository with standardized protocols and data from varied populations would be a key step forward. Such a resource would not only strengthen reproducibility but also support more robust cross-study comparisons. This approach mirrors the success of the Human Connectome Project in the domain of fMRI, and serves as a model worth adapting for EEG research. Additionally, open datasets can help reduce the financial burden on individual research teams, foster collaboration, and accelerate the pace of innovation. They also promote transparency and allow the community to test and validate findings on independent data an essential requirement for clinical acceptance.

In sum, creating shared frameworks and accessible resources will be crucial for ensuring that findings in this field are not only scientifically sound but also relevant and usable in real-world clinical settings. Despite this need, an ethical framework need to be proposed by the scientific community due to the neurobiological datasets shared from human patients. A standardized broader framework that if not globally, but continent-wise is important. This would enable researchers to collaboratively share and investigate various dimensions of data based on their expertise and advance the field in a more efficient and constructive manner while respecting ethical concerns and regional and international data privacy law.

Data preprocessing

The reviewed literature demonstrates considerable variability in EEG preprocessing methods. We recommend the use of the COBIDAS-MEEG toolbox, which includes the following standardized steps: (1) detection and removal of bad channels, (2) artifact cleaning using techniques such as Independent Component Analysis (ICA) to remove eye and cardiac artifacts, (3) high-pass and low-pass filtering, (4) signal segmentation, (5) baseline correction, (6) transformation of electrode references, and (7) spectral transformation for frequency analysis. We also recommend furture research to test the differences across the same datasets when bad channels are interpolated and not, as in our opinion, noises might be part of the clinical symtpms. To ensure reproducibility, each step should be performed using consistent parameters and reported in exhaustive detail. Strict adherence to this standardized protocol will enhance methodological comparability and repeatability across studies.

Reporting accuracy

The reviewed studies exhibit considerable variability in how accuracy is reported. To address this issue, we recommend adhering to the TRIPOD guidelines (Transparent Reporting of a Multivariable Prediction Model) and encouraging the publication of full model training code, either within the paper or on platforms such as GitHub. Combining TRIPOD with the COBIDAS-MEEG standards for EEG preprocessing can establish a robust scientific foundation for the standardization of EEG-based machine learning research. This harmonization will facilitate more meaningful comparisons across studies and accelerate clinical adoption of predictive models. We further propose that international consortia should develop a specialized TRIPOD-EEG extension to better suit the unique aspects of EEG-based machine learning studies.

Model interpretation

Many models have been treated as “black boxes” without sufficient interpretation. Identifying which biomarkers, electrodes, or EEG waveforms are critical for OCD diagnosis and it is essential not only for accurate diagnosis but also for guiding treatment and precision medicine. Currently, interpretability is lacking, which undermines clinical trust in these models and limits the development of applications such as tES electrode placement and neurobiofeedback using feedback loops. As a result, most studies focused solely on binary diagnosis, restricting their potential impact on treatment strategies. Successful examples, such as FDA-approved neurofeedback devices for anxiety, demonstrate that EEG biomarkers can underpin non-invasive diagnostic and therapeutic tools.

While we encourage the integration of SHAP and LIME to enhance model transparency, it is important to recognize the practical considerations when applying these methods to EEG data. First, the effectiveness of these techniques is heavily dependent on the quality and relevance of input features. EEG data, often high-dimensional and noisy, can pose challenges in feature extraction, potentially reducing the interpretability of model outputs. Second, SHAP and LIME assume some degree of model agnosticism or structural compatibility—certain deep learning architectures, especially those working directly on raw EEG time-series data, may require preprocessing steps such as dimensionality reduction or segmentation to produce meaningful explanations. Third, interpreting the resulting importance scores in a neurophysiological context is non-trivial, as EEG features may not map directly to intuitive biomarkers without further domain knowledge or validation. Thus, while SHAP and LIME hold promise for improving transparency and guiding clinical applications (e.g., neurofeedback electrode placement or TES targeting), their implementation must be carefully tailored to the data type and clinical goals. Future studies should aim to validate the interpretability outputs with domain experts and explore hybrid strategies combining model-specific interpretability with domain-relevant visualization tools.

Such interpretability tools could serve as a bridge between machine learning model outputs and actionable clinical insights, such as identifying relevant biomarkers for individualized tES targeting or optimizing neurofeedback protocols. Their integration could significantly advance trust and usability in clinical environments, provided their methodological constraints are transparently addressed.

Overall, the lack of a clear conceptual and methodological connection between the reviewed findings and the specific symptoms of OCD makes it challenging to interpret the results in a way that is clinically useful. This gap seems to reflect the nature of the research questions often asked in these studies, which are frequently shaped by researchers from technical or engineering backgrounds. As a result, many of the studies tend to focus heavily on algorithmic or signal processing aspects, without anchoring their analyses in established clinical knowledge about OCD or in its underlying neurobiology. For example, few studies link their EEG findings to known patterns in OCD, such as altered theta or alpha activity, or consider how these patterns might differ across symptom types like contamination fears or compulsive checking. Similarly, the possibility of using EEG markers to predict responses to treatments such as SSRIs or cognitive behavioural therapy is not addressed. Without incorporating these clinical aspects, the research risks making broad, generalized claims that may not translate meaningfully to OCD. This lack of integration reduces the potential of EEG-ML methods to inform clinical practice or deepen understanding of OCD specifically. Going forward, studies, clinicians and patients involved would benefit from incorporating clinical variables such as symptom subtype, severity, or treatment history into their models and analyses. Aligning methodological development with clinical and theoretical knowledge about OCD could help the field move toward more targeted and informative findings, rather than remaining at the level of general critiques or technical demonstrations.

From challenge to vision

The greatest challenge identified in this review is the significant heterogeneity across the current literature, which complicates the identification of common features. To address this, we propose a three-stage framework (cf., Fig. 6) designed to guide researchers toward greater standardization and consistency by adhering to established protocols and addressing existing research gaps.

Fig. 6

Three-stage standardization framework to reduce methodological heterogeneity and reduce research gap in EEG-based machine learning studies for OCD

Towards clinical translation

To move EEG-based machine learning from research to clinical settings, three translation gaps must be filled:

Regulatory readiness: Models must meet repeatability criteria in Fig. 6 to be homogeneous and FDA-reviewable.

Interpretability integration: Combine SHAP/LIME with models for better electrode placement in neurofeedback and TES, and better analysis of gamma and high beta waves.

Comorbidity management: Develop hybrid EEG-fMRI biomarkers to differentiate OCD from e.g., Depression/ADHD. Pilot studies using portable EEG headsets can test real-world feasibility.

Limitations

Despite conducting a thorough search across four major databases (PubMed, Scopus, IEEE, and Web of Science), and adhering to PRISMA guidelines, only 11 studies met our inclusion criteria. Considerable heterogeneity was observed across multiple aspects, including EEG recording protocols, preprocessing techniques, feature extraction methods, machine learning algorithms, and accuracy reporting. These challenges led to two primary conclusions: first, there is a critical need for standardized protocols and frameworks for EEG data acquisition and machine learning reporting; second, significant research gaps exist regarding model interpretability and the differentiation of OCD subtypes. Recognizing these limitations provides valuable insights that can guide future research toward greater reproducibility, enable more robust meta-analyses, and accelerate the translation of EEG-based machine learning approaches from experimental studies to clinical applications in OCD.

Future computationl and clinical direction

While the reviewed studies suggested promising avenues for using EEG and machine learning models in classifying OCD, the evidence base remains preliminary. Most included studies suffer from small sample sizes, lack of consistent external validation, and limited or absent interpretability methods. Therefore, translating these tools into clinical practice including regulatory milestones such as FDA approval or application in neurofeedback and tES targeting remains a long-term goal rather than an immediate prospect. Clinical utility is contingent on the development of standardized protocols, consistent reporting practices, and rigorous validation frameworks. Until these conditions are met, findings should be interpreted cautiously, especially when reported in terms of diagnostic or therapeutic utility such as tES and neuro-biofeedback interventions.

We suggest future studies to further consider following points during investigations:

As outlined in clinical diagnostic guidelines, OCD frequently co-occurs with a range of other neurodevelopmental and psychiatric conditions. When building computational models for OCD diagnosis, it’s important to account for this overlap. Brain activity patterns in individuals with OCD can be influenced by comorbidities such as anxiety disorders, ADHD, PTSD, tic disorders, and mood disturbances. These variations, shaped by the diverse clinical profiles seen in OCD, must be carefully considered to ensure that diagnostic models are accurate and generalizable.

Variations in OCD symptoms, treatment type, timing, and response to the clinical efforts also present key considerations for modeling. Pharmaceutical interventions, the stage of illness (e.g., early onset, chronic OCD), timing of symptom presentation, and the trajectory of therapeutic response all influence brain dynamics in meaningful ways. In particular, individuals with treatment-resistant OCD—an underrepresented group in many studies—may show distinct neurophysiological patterns. Including data from such cases can enhance predictive models and help tailor interventions such as cognitive-behavioral therapy, pharmacotherapy, neurofeedback, or neuromodulation techniques like transcranial magnetic stimulation.

Demographic variables, such as age and gender, also play a significant role in shaping OCD’s neural profile. For example, pediatric-onset OCD may present differently from adult-onset cases, and symptom expression often varies between males and females. Diagnostic models should be refined by integrating such demographic distinctions along with clinical factors like symptom severity, compulsivity levels, insight, and the presence of comorbid behaviors such as avoidance or sleep disruption. These parameters help create a more nuanced and individualized diagnostic approach.

To better differentiate OCD-related brain activity from healthy patterns, statistical analysis of extracted features should be complemented by more advanced signal processing techniques. Methods such as power spectral density (PSD) analysis and functional connectivity mapping can uncover subtle but meaningful distinctions in resting-state activity. Focusing on specific frequency bands and relevant brain regions those found to show altered communication in OCD can improve classification accuracy and strengthen the diagnostic relevance of these computational tools.

In addition, some studies in this review reported very high accuracies or made claims regarding clinical applicability without adequate supporting evidence. For instance, the use of terms like “diagnostic tool” or “clinical prediction” was sometimes made in the absence of external validation, interpretability analysis, or outcome-based evaluation. To improve scientific transparency and prevent overinterpretation:

Future authors should clearly state whether their findings are exploratory or validated.

Claims about clinical utility should be qualified unless supported by prospective trials, larger sample sizes, or real-world evaluations.

Interpretability tools (e.g., SHAP, Grad-CAM) should accompany performance metrics to support clinical translation.

Studies should avoid equating high cross-validation accuracy with diagnostic readiness.

Finally, dimensionality reduction and advanced modeling techniques offer promising pathways to uncover deeper patterns within the data. Techniques like t-distributed stochastic neighbor embedding (t-SNE) can be used to explore whether neural data clusters by OCD symptom severity or subtype. Similarly, using architectures like autoencoders, transformers, or unsupervised learning methods can help reveal hidden structure in EEG data, enhance model interpretability, and support the development of more robust, data-driven diagnostic frameworks for OCD.

View original article

BMC PSYCHIATRY

Share Bookmark

0 0 0 0 0 0 0

More from this channel

A systematic review of EEG-based machine learning classifications for obsessive-compulsive disorder: current status and future directions

Comments (0)