CNN-based framework for Alzheimer's disease detection from EEG via dynamic mode decomposition

Abstract

Alzheimer's disease (AD) and frontotemporal dementia (FTD) are major neurodegenerative disorders with characteristic EEG alterations. While most prior studies have focused on eyes-closed (EC) EEG, where stable alpha rhythms support relatively high classification performance, eyes-open (EO) EEG has proven particularly challenging for AD, as low-frequency instability obscures the typical spectral alterations. In contrast, FTD often remains more discriminable under EO conditions, reflecting distinct neurophysiological dynamics between the two disorders. To address this challenge, we propose a CNN-based framework that applies Dynamic Mode Decomposition (DMD) to segment EO EEG into shorter temporal windows and employs a 3D CNN to capture spatio-temporal-spectral representations. This approach outperformed not only the conventional short-epoch spectral ML pipeline but also the same CNN architecture trained on FFT-based features, with particularly pronounced improvements observed in AD classification. Excluding delta yielded small gains in AD-involving contrasts, whereas FTD/CN was unchanged or slightly better with delta retained—suggesting delta is more perturbative in AD under EO conditions.

1 Introduction

Alzheimer's disease (AD) and frontotemporal dementia (FTD) are two of the most common progressive neurodegenerative disorders, predominantly affecting older adults and leading to memory loss, cognitive decline, and behavioral impairments (Levy et al., 1996; Ntetska et al., 2025; Perry and Hodges, 2000; Zuin et al., 2024). While AD is primarily characterized by memory deterioration, language deficits, and visuospatial dysfunction, FTD manifests early through behavioral changes such as disinhibition, apathy, compulsivity, and language impairment, with relative preservation of memory in the early course (Perry and Hodges, 2000; Nishida et al., 2011). Despite differences, the two disorders exhibit overlapping symptoms, complicating diagnosis. Currently, there are no curative treatments for either condition, making early and accurate diagnosis of paramount importance.

Diagnosis of AD and FTD typically involves neuropsychological testing, magnetic resonance imaging (MRI), and fluorodeoxyglucose positron emission tomography (FDG-PET). Although effective, these methods are costly, not universally accessible, and limited in their sensitivity during the early stages (Davatzikos et al., 2008; Jack et al., 2018; Jiang, 2023). Electroencephalography (EEG), on the other hand, is a low-cost, non-invasive, and widely accessible tool that captures neural activity with millisecond-level temporal resolution. Characteristic EEG changes, such as posterior alpha rhythm slowing and increased theta/delta power, have been consistently reported in AD and FTD (Jeong, 2004; Musaeus et al., 2018; Nishida et al., 2011).

In parallel, recent advances in machine learning (ML) and deep learning (DL) have enabled automatic classification of EEG signals, uncovering spatial and temporal patterns indicative of neurodegenerative disorders (AlSharabi et al., 2023; Cansiz et al., 2025; Khan et al., 2025; Zhang et al., 2025). Studies have reported abnormalities in oscillatory dynamics and functional connectivity in AD and FTD patients using ML/DL models (Adebisi et al., 2023; Afshari and Jalili, 2017).

In particular, a publicly available dataset of resting-state EEG recordings under eyes-closed (EC) conditions (Miltiadous et al., 2023) and its extension with CNN-based classification (Stefanou et al., 2025) have demonstrated the potential of EEG-driven computational models for dementia detection.

According to Stefanou et al. (2025), the authors proposed a novel CNN-based framework for Alzheimer's disease detection that employed EEG spectrogram representations under eyes-closed (EC) conditions. Specifically, they transformed EEG recordings into time–frequency spectrograms (using FFT) and used these as inputs to a convolutional neural network. Their approach achieved robust classification performance in distinguishing AD, FTD, and healthy controls (CN), validated with a leave-N-subjects-out (LNSO) scheme—79.45% for AD/CN, 72.85% for FTD/CN, and 80.69% for AD+FTD/CN—underscoring the utility of eyes-closed spectrogram-based CNNs for dementia EEG.

Most EEG-based biomarkers of dementia have historically been derived from eyes-closed (EC) recordings, where stable alpha rhythms provide reliable spectral features (Babiloni et al., 2017; Rossini et al., 2020). In contrast, eyes-open (EO) EEG has been far less studied due to reduced alpha activity, greater variability, and susceptibility to ocular and attentional artifacts, making it more challenging to analyze (Ntetska et al., 2025). The recent release of an EO EEG dataset under photostimulation by Ntetska et al. (2025) provides a timely opportunity to explore this condition, which captures neural dynamics distinct from EC.

Whereas EC recordings are typically dominated by posterior alpha rhythms reflecting a relaxed resting state, EO EEG exhibits marked alpha suppression alongside increased theta and beta activity, reflecting attentional and cognitive engagement. Importantly, Ntetska et al. reported that AD and FTD patients showed reduced alpha suppression compared to controls, indicating impaired neural reactivity to visual stimulation. These findings highlight that EO EEG provides distinct and clinically relevant neural dynamics, underscoring the importance of developing tailored analytic approaches.

However, conventional FFT-based spectrograms, while effective in EC conditions, tend to overemphasize low-frequency power in EO recordings, potentially obscuring other relevant patterns. To address this limitation, we propose a CNN-based framework that incorporates novel features derived from Dynamic Mode Decomposition (DMD). Unlike FFT, DMD captures spatio-temporal coherent modes and thus provides a representation that emphasizes dynamic neural characteristics beyond static frequency-domain power. This approach is expected to yield complementary biomarkers for dementia classification by better characterizing the unique dynamics of EO EEG.

In previous CNN studies on the EC dataset, spectrograms derived from longer windows (e.g., 30-s epochs) outperformed shorter windows (Stefanou et al., 2025), consistent with findings that longer epochs improve spectral reliability (Ng et al., 2022). By contrast, our EO recordings contain sequential photic stimulation, so long windows would mix multiple stimulus conditions and blur nonstationary dynamics: SSVEP(Steady-State Visually Evoked Potential) responses exhibit time-ordered, frequency-dependent changes in phase synchrony and propagation (Norcia et al., 2015; Tsoneva et al., 2021). Accordingly, we do not model long series directly. Instead, we deliberately avoid long-window aggregation and apply Dynamic Mode Decomposition (DMD) to short, non-overlapping 2-s slices, obtaining coherent spatiotemporal modes with identifiable oscillation frequencies beyond per-channel power (Schmid, 2010; Tu et al., 2014). The slice-wise DMD mode-magnitude maps are then stacked in order to form a 3D tensor for the CNN, preserving segment-to-segment evolution while preventing stimulus mixing and reducing the computational burden of long-horizon DMD.

Because SSVEP responses are sensitive to stimulus paradigm and frequency (Norcia et al., 2015), we treat subject-specific stimulus heterogeneity as a nuisance and adopt a stimulus-agnostic, uniform epoching strategy. Within a single slice-based pipeline, recordings are partitioned uniformly within stimulus-on periods into non-overlapping short slices for feature extraction, so that stimulus composition is not explicitly stratified or encoded as a predictive cue. For a like-for-like comparison, we also derive an FFT-based spectrotemporal representation using the same epoching and slicing scheme in place of DMD; aside from the feature extractor (DMD vs. FFT), the classifier configuration, input shape, and training protocol are identical, and matched-dimensionality tensors are fed to the same CNN. Consistent with this design choice, we verified on the dataset—via class–category distribution analysis—that stimulus composition did not materially bias between-group comparisons.

While DMD provides a representation that captures spatio-temporal modes beyond conventional frequency power, a further consideration is the role of low-frequency activity in EO EEG. In previous EC-based CNN studies (Stefanou et al., 2025), spectrogram features yielded high classification accuracy; however, when the same methodology was applied to the EO dataset (Ntetska et al., 2025), the performance for AD was substantially lower. We attribute this reduction to the disproportionate influence of delta activity (0.5–4 Hz, as defined in Ntetska et al., 2025) in EO recordings, which may obscure disease-relevant dynamics.

To test this hypothesis, we conducted controlled comparisons using both FFT- and DMD-based representations, evaluating conditions with and without the delta-band (0.5–40 Hz vs. 4–40 Hz). By applying the same CNN framework across all feature sets, we were able to directly assess the extent to which excluding delta activity improves classification performance in EO EEG.

A critical issue in EEG-based deep learning for dementia classification lies in the choice of evaluation methodology. Several earlier studies employed segment-based or random cross-validation procedures, in which EEG segments from the same subject could appear in both the training and test sets, leading to data leakage and overly optimistic performance estimates (Brookshire et al., 2024). Brookshire et al. explicitly demonstrated that such leakage can dramatically inflate classification accuracy in Alzheimer's studies, and strongly recommended subject-wise validation strategies to avoid identity confounding. Following this recommendation, the EC-based CNN spectrogram study adopted a subject-wise scheme, namely (LNSO), where entire subjects are excluded from the training set whenever they are used for testing (Stefanou et al., 2025).

The importance of subject-wise partitioning has been widely emphasized in the EEG literature. For example, Zanola et al. (2025) showed that nested leave-N-subjects-out (LNSO) validation provides more reliable performance estimates than non-nested approaches that are prone to overfitting. Similarly, Kunjan et al. (2021) demonstrated that subject-level cross-validation (e.g., LOSO) yields substantially more robust generalization estimates than random k-fold validation in EEG-based disease diagnosis. Building on this evidence, we adopted an LNSO validation framework to ensure fair and reliable evaluation across participants, thereby avoiding inflated metrics and enhancing the credibility of the reported results.

Therefore, in this study, we introduce a CNN-based framework that incorporates DMD-derived features for analyzing EO EEG in dementia classification. By addressing the limitations of FFT-based representations and ensuring rigorous subject-wise evaluation, our work contributes a novel perspective on the role of EO EEG as a complementary biomarker for AD and FTD.

2 Materials and methods

This section provides a detailed description of the dataset, feature construction process, and classification framework adopted in this study. We first introduce the dataset of stimulus-related EEG recordings, including its characteristics and the criteria used for subject inclusion and exclusion. Next, we describe the Dynamic Mode Decomposition (DMD) procedure applied to the segmented EEG data, which transforms each epoch into a set of spatio-temporal modes. The subsequent feature extraction step outlines how DMD-derived representations were converted into fixed-size images, together with the construction of alternative FFT-based features for comparative analysis. We then present the CNN model architecture used for classification, along with the baseline algorithms against which our approach was evaluated. Finally, we detail the validation methodology, emphasizing the use of subject-wise partitions to ensure a fair and reliable assessment of classification performance.

2.1 Dataset

This study used scalp EEG recordings from a publicly available dataset (OpenNeuro, dataset ID: ds006036, version 1.0.4; DOI: 10.18112/openneuro.ds006036.v1.0.4, which was updated in April 2025. EEG signals were collected using 19 Ag/AgCl electrodes (Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, and O2) by the international 10–20 system and sampled at 500 Hz with a resolution of 10 μV/mm. A total of 88 participants were included in this study, comprising 36 patients with Alzheimer's disease (AD), 23 patients with frontotemporal dementia (FTD), and 29 healthy controls (CN).

The international Mini-Mental State Examination (MMSE) was used to evaluate the cognitive and neuropsychological status of subjects, with scores ranging from 0 to 30 (lower scores indicating more severe impairment). The AD group (12 males, 24 females) had a mean age of 66.4 years (SD = 7.9) and an average MMSE score of 17.75 (SD = 4.5). The FTD group (14 males, 9 females) had a mean age of 63.7 years (SD = 8.2) with an average MMSE score of 22.17 (SD = 2.6). The CN group (18 males, 11 females) had a mean age of 67.9 years (SD = 5.4), all scoring 30 on the MMSE. Data acquisition was ethically approved by the Scientific and Ethical Committee of the Aristotle University of Thessaloniki and AHEPA University Hospital (protocol number: 142/12-04-2023).

Note that in our experiments four recordings with insufficient usable duration ( ≤ 30 s; IDs 15, 64, 65, 78) were excluded; accordingly, all analyses use the post-exclusion sample with the following demographics: AD (11M/24F), age 66.5 ± 8.0 y, MMSE 17.7 ± 4.6; FTD (13M/9F), age 63.7 ± 8.4 y, MMSE 22.2 ± 2.7; CN (17M/10F), age 67.9 ± 5.6 y, MMSE 30.

While this dataset provides both raw and preprocessed EEG signals, many previous studies have utilized the preprocessed version, which includes noise filtering and artifact removal (see Ntetska et al., 2025). In this study, we also employed the preprocessed EEG signals. Consequently, no additional preprocessing steps were applied during the feature extraction stage, as the provided data were already cleaned and ready for analysis.

Furthermore, since our focus was on evaluating brain responses to visual stimulation, only EEG segments corresponding to visual stimulus events were selected for analysis. Specifically, for each subject, we identified periods during which visual stimuli were presented and extracted EEG data exclusively within these intervals. The detailed onset/offset boundaries for each subject and stimulus frequency are reported in the Photic stimulation intervals by frequencies table (Appendix Table 5). As shown in Table 1, the resulting segments do not have a uniform duration across subjects. Consequently, some recordings with relatively short durations could not be included in subsequent analyses, as our methodology required sufficiently long segments to ensure reliable evaluation of stimulus-related brain dynamics.

Alzheimer's disease (AD)Normal control (CN)Frontotemporal dementia (FTD)ID[Onset, Offset]DurationID[Onset, Offset]DurationID[Onset, Offset]Duration1[3.80, 73.79]69.9937[6.48, 76.47]69.9966[1.26, 91.25]89.992[17.39, 107.38]89.9938[14.82, 84.82]69.9967[16.01, 106.00]89.993[0.03, 45.49]45.4639[14.94, 165.01]150.0768[0.30, 90.29]89.994[15.35, 105.34]89.9940[0.52, 70.51]69.9969[14.25, 104.24]89.995[5.37, 94.87]89.5041[19.09, 109.08]89.9970[7.11, 97.10]89.996[14.37, 123.38]109.0142[16.49, 106.49]89.9971[3.88, 93.88]89.997[17.39, 107.39]89.9943[7.25, 157.78]150.5372[0.95, 90.94]89.998[15.57, 105.56]89.9944[9.52, 99.52]89.9973[18.57, 108.56]89.999[21.89, 111.88]89.9945[1.51, 126.46]124.9674[4.23, 94.22]89.9910[15.08, 105.07]89.9946[14.29, 104.28]89.9975[12.63, 122.50]109.8611[13.99, 103.98]89.9947[14.33, 104.33]89.9976[10.85, 100.67]89.8112[2.99, 92.98]89.9948[4.57, 94.56]89.9977[8.49, 98.48]89.9913[17.18, 107.17]89.9949[27.09, 117.08]89.9978[8.11, 38.10]30.0014[10.05, 100.04]89.9950[9.49, 99.48]89.9979[14.65, 84.64]69.9915[0.45, 30.45]30.0051[12.25, 102.24]89.9980[8.56, 78.55]69.9916[5.05, 95.04]89.9952[2.08, 92.07]89.9981[21.90, 91.89]69.9917[19.54, 109.53]89.9953[10.55, 100.54]89.9982[3.81, 73.80]69.9918[24.73, 114.72]89.9954[3.83, 153.64]149.8183[1.40, 71.39]69.9919[0.68, 90.67]89.9955[15.39, 105.38]89.9984[2.29, 72.28]69.9920[0.95, 90.95]89.9956[17.39, 107.38]89.9985[29.51, 99.50]69.9921[18.10, 108.09]89.9957[0.06, 69.45]69.3986[9.77, 79.76]69.9922[4.05, 94.04]89.9958[0.05, 61.44]61.3987[5.51, 75.50]69.9923[2.92, 72.91]69.9959[0.45, 70.44]69.9988[58.40, 128.39]69.9924[7.33, 77.32]69.9960[0.74, 130.71]129.9725[3.81, 73.80]69.9961[6.05, 76.04]69.9926[12.33, 82.32]69.9962[16.58, 86.57]69.9927[4.97, 94.96]89.9963[2.74, 72.73]69.9928[3.89, 73.88]69.9964[0.03, 23.48]23.4529[34.24, 104.23]69.9965[0.03, 20.28]20.2530[1.59, 71.58]69.9931[7.27, 77.26]69.9932[2.79, 72.78]69.9933[25.86, 95.85]69.9934[16.08, 86.07]69.9935[18.33, 108.32]89.9936[5.32, 75.32]69.99

Summary of visual stimulus onset and offset times (in seconds) and the corresponding stimulus durations for each subject in the Alzheimer's disease (AD), Normal control (CN), and Frontotemporal dementia (FTD) groups.

Note that as shown in Table 1, usable segment durations vary across subjects. To obtain stable subject-level estimates under LNSO, we required that each recording allow construction of at least one ≥24 s epoch after preprocessing; recordings with total usable duration < 30 s were excluded. This threshold was motivated by the empirical duration distribution and a brief sensitivity check, which indicated that including very short recordings led to unstable estimates and a noticeable drop in accuracy. Applying this rule excluded four short recordings (Subject IDs: 15, 64, 65, and 78) from subsequent analyses.

2.2 Dynamic mode decomposition

One established use of Dynamic Mode Decomposition (DMD) is to characterize the temporal evolution of high-dimensional signals (Schmid, 2010; Tu et al., 2014), with EEG/ERP applications demonstrating phase-consistent dynamics captured by DMD (Li et al., 2022). DMD decomposes the changing patterns of the signal into fundamental elements known as “dynamic modes.” These modes represent the characteristics of the signal as it varies over time, and each mode represents the movement of a specific frequency within the signal (Schmid, 2010).

The EEG signal is decomposed into a sum of signals in the DMD mode using DMD, and the filtering is performed by reconstructing the signal with only the modes that meet the filtering parameters. We briefly explain the DMD and its components, which are the results needed for decomposing the EEG signal. In addition, we specify the eigenfrequency of the decomposed mode signals to be used for filtering.

2.2.1 Mathematical formulation

Dynamic Mode Decomposition (DMD) is a technique used to slice states distinguished by dynamic modes. These modes consist of empirically derived vectors, extracted directly from the data, a process elaborated in Tu et al. (2014). Fundamentally, DMD operates as a method for order reduction, proficient in distilling the intrinsic dynamics present in multidimensional complex systems by isolating specific frequencies, as explored in Dang et al. (2018).

Consider a time series

where x(tk) belongs to ℝM, and the time interval between sample points tk+1−tk is fixed at Δt. For a given signal X in Equation 1, the (MS) × (NS) shift-stack Hankel matrix Y(S) is constructed as:

where NS: = N−S+1 and S denotes the predetermined stack size. To encapsulate the maximal spectrum and temporal complexity of the original signal, it is imperative to maximize the dimensions of (MS) × (NS).

The DMD algorithm accomplishes a low-rank eigendecomposition of the matrix A by optimally approximating yk in the least squares sense, minimizing the following:

To diminish Equation 3, the NS column vectors are assembled into two data matrices with size (MS) × (NS−1):

Subsequently, the local linear approximation can be articulated as:

The resolution to Equation 4 entails discovering A that minimizes:

2.2.2 Eigen decomposition and mode calculation

Rather than conducting the eigendecomposition of A directly, the DMD algorithm employs a low-dimensional surrogate, , via Singular Value Decomposition (SVD) (Dang et al., 2018; Faires and Burden, 2012) of Y1:

where is a diagonal matrix containing R(≤ NS) eigenvalues of A, and Φ∈ℂ(MS) × (R) denotes the DMD modes.

In Equation 5, each snapshot, yk+1 can be approximated as:

for k = 1, 2, …, NS−1. Hence, the matrix furnishes an approximation of the sample data, decomposing it into a unified space-time matrix:

for k = 1, 2, …, NS where c is a sequence of weights for which y1 = Φc. Employing the components Φ, Λ, and c from Equation 6 we define a vector function

for approximating the term x(tk) given by

for i = 1, 2, …, M where λi: = Λ(i, i) and . An approximation of the original M-dimensional time series X in Equation 1 is provided by Equation 7 as follows:

Each in Equation 8 represents the data point at time tk reconstructed via DMD, minimizing the influence of noise and encapsulating the quintessential characteristics of the underlying dynamics. For additional details on the computational process, refer to Seo et al. (2020).

2.2.3 DMD components and eigen-frequencies

The signal x(tk) is approximated to given in Equation 7 by applying the DMD algorithm to the following three components: the mode matrix Φ∈ℂ(MS) × (R), the eigenvalue diagonal matrix Λ∈ℂR×R, and the initial amplitude vector c∈ℂR. Here, Φ represents the dominant spatial structure, Λk−1 represents the temporal evolution, and c represents the amplitude of the modes. For convenience, these three components used in the signal approximation are collectively referred to as the “DMD components.” The discrete function x(tk): in Equation 1, which defines the time series X, is extended to a continuous function x(t) using the DMD components, which is approximated by

for i = 1, 2, …, M. Then the jth “eigenfrequency”, denoted by ωj, is given by

where ωj represents the frequency, expressed in cycles per second, of the jth mode signal corresponding to λj, and “Im(·)” denotes the imaginary part of a complex number (Seo et al., 2020).

2.3 Features extraction

To ensure sufficient temporal coverage, only participants with total durations longer than 30s were included. This criterion resulted in the exclusion of four subjects (Subject IDs: 15, 64, 65, and 78) with shorter recordings.

Feature extraction proceeds in three steps: (i) epoch construction, (ii) DMD-based windowed segmentation, and (iii) formation of 3D sequenced mode maps.

2.3.1 Creation of epochs

From the stimulus-related EEG recordings described in Table 1, 10 epochs of length 24s were constructed for evert eligible subject by sliding time window across the continuous recordings between photic stimulus onset and offset. Rather than enforcing strictly non-overlapping intervals, a partially overlapping windowing strategy was adopted. This approach ensured that each participant contributed an equal number of epochs while efficiently utilizing the available data, particularly for subjects with limited recording durations. The procedure is illustrated in Preprocessed EEG Recording & Epoch Creation in Figure 1.

Diagram showing the process of EEG recording and analysis. Top section shows preprocessed EEG data with epochs labeled. The middle section illustrates windowed feature extraction using dynamic mode decomposition with two-second windows. The bottom section presents 3D sequenced images representing features, organized in a 50x50 grid, stacked twelve times.

Overview of the preprocessing and feature extraction pipeline. Continuous EEG recordings were segmented into 24-second epochs with overlap. Each epoch was further divided into 2-second windows, from which features were extracted using Dynamic Mode Decomposition (DMD). The resulting spectro-spatial representations were mapped into 50 × 50 grayscale mode maps, and by stacking 12 consecutive segments, each epoch was represented as a 50 × 50 × 12 three-dimensional image for CNN-based classification.

2.3.2 Segmentation of epochs

Each 24s epoch was further subdivided into 12 segments of length 2s, on which Dynamic Mode Decomposition (DMD) was applied separately to each segment to extract dynamic modes. This process is visualized in Windowed Feature Extraction using DMD in Figure 1.

2.3.3 Formation of 3D sequenced images

The resulting mode representations were converted into 50 × 50 grayscale mode maps (resized to 50 × 50 by bilinear interpolation and rescaled to [0, 1]), which were sequentially stacked to form a three-dimensional array of size 50 × 50 × 12 and used directly as numeric tensors for the CNN. When persisted, arrays were stored in a lossless binary format for direct loading into the training pipeline. A visual example of this structured input is provided in 3D Sequenced Images in Figure 1.

Our uniform epoch selection was designed a priori to minimize the direct encoding of stimulus composition as a feature; we then validated this design by summarizing and testing the class–category distribution at the dataset level (see Supplementary Material Section 2; Figure S1), which showed no material imbalance across groups.

Note that the 24 s epoch length reflects an empirical trade-off: longer epochs reduced the pool of eligible recordings under the LNSO protocol, whereas shorter epochs degraded classification accuracy; accordingly, 24 s was fixed throughout the analyses as a balance between data retention and performance.

Comments (0)

No login
gif