Quantitative investigation on working memory patterns through EEG based on visual attention task for children with learning disability

Abstract

Learning disabilities in children are exhibited through difficulties in reading and writing due to lack of cognitive skills. It is generally diagnosed by analyzing the behavior and processing capacity of children by understanding their academic candidature. This can also be evidenced by capturing and analyzing their working memory patterns in the brain that show the effectiveness of therapeutic interventions in children with learning disabilities. This research works with the electroencephalography (EEG) signal data from the IEEE dataport consisting of 121 participants in total, of which 61 are ADHD and 60 are normal children aged 7–12 years. The usage of these data has influenced ground truth research by providing reliable data and mitigating the challenge with real-time availability of EEG data. This manuscript focuses on classifying the dataset into categories of children, viz normal and attention deficit hyperactivity disorder (ADHD), using brain connectivity parameters and validation through machine learning (ML) algorithms. Children with learning disabilities undergo therapeutic interventions to manage their disability. Generally, the progress of their intellectual capability can be assessed through visual cues and the responses that the children exhibit. Rather, their differences in brain cognition need to be analyzed to realize the outcomes of therapeutical effect. In this research, the brain connectivity parameters such as power spectral density (PSD), granger causality (GC), phase slope index (PSI), partial directed coherence (PDC), and directed transmission function (DTF) are estimated, quantified, and analyzed. Further, using the measures of brain connectivity parameters, certain ML algorithms—such as the logistic regression (LR), support vector machine (SVM), decision tree (DT), the k-nearest neighbor (KNN), and random forest (RF)—along with a deep learning model, viz. deep belief networks (DBN) have been employed for validating this study. Among these models, DBN offered a model accuracy of 89.7%. Hence, this concept emphasizes the validation and effectiveness of therapeutic interventions that can support clinical evaluations in children with learning disability.

Introduction

Cognitive skills are a collection of abilities attained by efficiency of the brain lobes, as displayed in Figure 1a. As shown in Figure 1b, there are totally four lobes in the brain, namely frontal, parietal, occipital, and temporal. Frontal lobe is responsible for logical thinking and decision making, whereas the parietal lobe is responsible for sensory system processing, while the temporal lobe is responsible for auditory system processing, and the occipital lobe has the responsibility to process visual information. These lobes communicate and pass information in the form of electrical signals which is transmitted through millions of neurons available in the central nervous system (CNS).

Diagram consisting of four parts: (a) A circle divided into eight sections, listing cognitive skills such as working memory and processing speed. (b) A labeled side view of a brain showing the frontal, parietal, temporal, and occipital lobes. (c) A central circle labeled “Characteristics of ADHD” with arrows pointing to issues like anxiety, behavior problems, and learning disorders. (d) Top view diagram of a head with labeled electrode positions, showing a standard EEG setup.

(a) Representation of cognitive skills. (b) Lobes of the brain. (c) Characteristics of ADHD. (d) Positioning of Emotiv electrodes.

Learning disability (LD) occurs due to the lack of one or more cognitive skills. Learning disabilities are of different types such as dyslexia, dyscalculia, dysgraphia, dyspraxia, etc., among which attention deficit hyperactivity disorder (ADHD) is one of the predominant learning disabilities. Children with ADHD face difficulty paying attention, leading to losing track of their work. Their characteristics are further discussed in Figure 1c. In therapeutic procedures, it is essential that the therapist must be aware of the child's initial level of learning disability to develop a treatment plan. Similarly, during the period of the treatment procedure, it is necessary to keep track of their level of learning to know whether the treatment has influenced them in managing their disability. Major improvements can be realized and assessed visually, but minor improvements that are not vocalized will ensure that the therapist continues treatment the same way for further progress to be measured using certain analytical techniques. However, assessing the impact of therapeutic procedures can be challenging for children with remedial needs, although brain improvements may occur in them, but the changes might not always be reflected in the child's behavior or characteristics.

Electroencephalography (EEG) signals are the recordings of electrical signals that a neuron transmits during communication between the lobes. The change in electrical activity is observed using EEG. Emotiv—a 19-channel device—is employed in which electrodes are placed on the scalp to record EEG signals. Electrodes—such as Fp1 and Fp2—correspond to the frontal-parietal region, while F7, F3, Fz, F4, and F8 are associated with the frontal region. T3, T5, T4, and T6 are in the temporal region, and C3, Cz, and C4 correspond to the central region of the brain. P3, Pz, P4, and P5 are placed in the parietal region, and O1 and O2 are placed over the occipital region of the brain, as shown in Figure 1d. The numbers in the electrodes do not have any specification. Working memory patterns are analyzed from the acquired EEG signals.

Therapeutic interventions are supported by analyzing and comparing the working memory pat terns of children with learning disabilities with those of normal children using statistical measures of the brain connectivity parameters. This approach emphasizes the application of methods to evaluate and validate the effectiveness of contribution therapeutic interventions in validating occupational therapy.

The concepts related to our approach followed by the methodology of our research—which consists of data collection with preprocessing phase, exploration on statistical parameter measures, and validation with ML algorithms and DL algorithm along with results and proof—are discussed in the following sections. Therefore, the main contribution of the research would be as follows:

Analyze and provide custom-based assessment of brain connectivity parameters for individuals that can help therapists facilitate feature-specific treatment interventions.

Estimate even minor improvements in brain connections for disordered children as an outcome of therapy, which currently may remain unvoiced due to the disabilities in the children.

Figure out the optimal ML classifier model that can perform an accurate classification between disordered and normal children.

Background and summary

Learning disabilities in children can be diagnosed and classified by interpreting the method through which the brain processes the information (Manghirmalani et al., 2011). Cognitive skills in children involve information processing and improve with a change in learning technique, and this can be observed using brain waves (Chrisilla et al., 2021). Children with learning disabilities find it difficult to pay attention and understand, when multiple items are placed for a task due to display factors, the child's attention becomes distracted, they find it difficult to stay focused on the task, and working memory is influenced by distraction, which can be diagnosed by recording and analyzing their working memory patterns from brain waves (Lenartowicz and Loo, 2014; Srinivasan et al., 2022; Mart'nez-Briones et al., 2021; Wang et al., 2022; Hollingworth and Beck, 2016; Hu et al., 2019; Frid and Manevitz, 2018; Mart'nez-Briones et al., 2020). Electroencephalography (EEG) signals that represent brain signals are recorded using brain computer interfaces (BCI) with electrodes placed in the brain area to help us acquire the working memory pattern that appears when the brain performs a task (Lenartowicz and Loo, 2014; Motie Nasrabadi et al., 2020; Zakopoulou and Miltiadous, 2022). Learning disabilities can be diagnosed by measuring the power distributed in the underlying frequency channels of EEG signals obtained by analyzing memory patterns. For example, ADHD factors are shown in EEG signals and are observed while measuring the band-wise power distribution (Mart'nez-Briones et al., 2021; Sandhya et al., 2015; Kuc et al., 2017; Mahmoodin et al., 2015; Khalaf et al., 2024). The influence of a channel over the other justifies the correlation between brain areas, where for children with LD there would be a poor influence, as the power of their channel frequency bands to solve the task would be low which can be statistically analyzed using brain connectivity parameters (Saleh, 2014). Learning disabilities progress according to age and other factors. Early detection of learning disability is a challenging task for which machine learning models are used; they can detect or diagnose disability by analyzing patterns from a few data samples through their training (Parmar and Paunwala, 2023; Seshadri N. P. G. et al., 2023; Seshadri N. P. et al., 2023; David and Balakrishnan, 2010; Chakraborty, 2020; Agrawal et al., 2024; Ahire et al., 2022, 2025; Tamboer et al., 2016; Atkar and Priyadarshini, 2020; Rezvani and Khorasani, 2019; Parmar et al., 2021). Machine learning models are used using EEG data to diagnose ADHD as an automated approach (Alim and Imtiaz, 2023). Machine learning techniques—such as multidimensional abnormalities, variational decomposition (VMD) with Hilbert transforms (HT), multi-resolutional analysis, and centrality measures—are used to classify children with ADHD and normal as an interpretable approach using EEG signals (Li et al., 2025; Khare and Acharya, 2023; Merudhula et al., 2024; Attallah, 2024). Furthermore, web-based PPS platforms have been developed for the earlier detection of ADHD using EEG data (Santarrosa-L'opez et al., 2025). As further advancement, deep learning models such as the graph convolutional network (GCN) are also used to extract multidomain features from EEG data to detect ADHD, and diagnostic interfaces supported with DL techniques are used to diagnose ADHD (Mao et al., 2025; Pappula and Anwar, 2024).

Various research studies that similarly support the diagnoses of learning disabilities have been performed specifically on dyslexia using machine learning techniques. However, this research focuses more on statistical analysis of brain connectivity parameters and incorporates state of-the-art ML/DL algorithms to understand the difference between ADHD and controls. Therefore, to realize the benefits of therapeutic interventions in ADHD children, the framework implemented in this investigation would be a pioneering approach to devise it.

Methods

Therapeutic interventions are supported by analyzing and comparing the working memory patterns of children with learning disabilities with those of normal children using a statistical measure of the brain connectivity parameters. To quantify the improvement in their brain cognition, certain brain connectivity parameters are analyzed statistically and validated using algorithms such as SVM, LR, KNN, DT, RF, and DBN.

Dataset description

The real-time data required for this research are highly comprehensive, and the availability of such data is limited. IEEE Dataport has positively influenced ground-truth research by providing reliable and well-structured data, thus mitigating the challenges associated with recording Electroencephalography (EEG) signals in real time.

The EEG dataset is derived from the IEEE port, which is publicly available and 32 Mb in size. The dataset is categorized into ADHD and Normal children folders. The ADHD data contains two folders, and the Normal children contain two separate folders. There were 121 participants, of which 61 children diagnosed with ADHD and 60 normal children, demographically explained in Figure 2 with all aged between 7 and 12 years, as shown in Table 1. A psychiatrist diagnosed ADHD children according to the DSM-IV criteria, and signals were collected from children who consumed Ritalin for a period of 6 months for the IEEE EEG Dataset collected by Ali Motie Nasrabadi of Shahed University. Normal children were clear and had no history of psychiatric conditions or risk behaviors (Motie Nasrabadi et al., 2020). Their signals are recorded by assigning a visual attention task.

Pie chart titled “Distribution of Children” with two equal halves. The left half is blue, labeled “Normal Children” with a value of sixty. The right half is pink, labeled “ADHD Children” with a value of sixty-one.

Dataset demographics: ADHD vs. normal children.

ParametersValuesNumber of ADHD children61Number of control children60Total number of participants121Sampling frequency128 HzNumber of channels (features)19Data collection and pre-processing

In this study, data collection involves collecting and preparing data. EEG signals are recorded by assigning a visual attention task to the children using 10–20 standard 19-channel Emotiv device at 128 Hz sampling frequency as a non-invasive approach. The children received a set of 5–16 cartoon images and were instructed to count the number of images provided per iteration (Motie Nasrabadi et al., 2020). While processing their memory to count the images, the neuron activity in the brain lobes that passes information via electrical signals was recorded using an Emotiv device in EEG signal format and was given as an open source EEG dataset.

The signals in different folders of the same category of the dataset are appended together before pre-processing the data. The raw data might be noisy, and therefore the quality of the signals might also be poor. The Butterworth filter is used to eliminate the noise from the raw EEG signals. Transfer function: The influence of the artifacts can be minimized by passing EEG signals through a filter.

As the data are driven from IEEE Dataport, the information regarding the recording environment is not mentioned. So, general measures are taken to handle unwanted noise and artifacts as shown in Figure 3. The filter transfer function in the z-domain determines the mathematical connection among the input and output signals.

where

Graph (a) shows EEG band power for the ADHD group over 6000 seconds, with fluctuating Delta, Theta, Alpha, and Beta waveforms. Graph (b) depicts the control group over 4000 seconds, showing similar measurements with smoother variations. Both use a 10-second moving average.

Workflow diagram for EEG signal preprocessing.

H(z) is the filter transfer function.

V(z) is the Z-transform of the output signal. U(z) is the Z-transform of the input signal.

u0 is the weight of current signal as input u(n).

u1z−1 is weight of input signal u(n-1) with one-step delay in time.

u2z−2 is weight of input signal u(n-2) with two-step delay in time.

Butterworth filter is a signal processing filter designed to provide a flat frequency response in the passband. The passband is the range of frequencies that it allows passing through the filter. This filter is broadly categorized into a low-pass Butterworth filter and a high-pass Butterworth filter. The raw EEG signals might have artifacts, and this affects the signal quality quite a lot. To remove this artifacts, a Butterworth filter is used to remove the noise. A low-pass Butterworth filter is especially chosen, as it can pass low-frequency signals such as the activity of the brainwave in the EEG data and significantly reduce high-frequency noise. Regardless of signaling morality, the signal acquisition process bound to have artifacts; due to children cooperation, and the signal is filtered to be uniform for customary phase, a Butterworth filter is used for the data.

A high-pass Butterworth filter allows signals with frequencies greater than the cut-off frequency to pass through the filter and attenuate signals with frequencies below the cut-off frequency. The cut-off frequency is the frequency at which the signals are attenuated if the signal frequency is below for the high-pass filter and above in the case of the low-pass filter than the cut-off frequency. The low-pass Butterworth filter is passed to the signals to attenuate high-frequency noise and muscle artifact, which will have a maximally flat frequency response. It attenuates the signal with frequency above the cut-off frequency and preserves the signal frequency below the cut-off frequency. The equation for low pass and high pass filter is given as Equation 1, whereas for low-pass filter, the coefficients will be very large positive numbers and for high-pass filter it will be alternating signs, where b and a are the normalization coefficients of cut-off frequency. Normalization of cut-off frequency is an important step in designing a Butterworth filter, as it ensures the performance accuracy of the filter.

Nyquist frequency is the sampling frequency which helps to normalize the cut-off frequency. The order determines the steepness of the filter. The number of samples taken per second is the sampling frequency during the conversion of a continuous signal to a discrete signal.

Power spectral density (PSD)

PSD is calculated to measure how much power is distributed in each frequency band of the signal. This analysis is derived from the concept of discrete Fourier transform (DFT).

Here, u[n] is discrete with U(k) as the DFT. Np symbolizes the number of points in DFT, and k is the frequency ind

Delta, Theta, Alpha, Beta, and Gamma are the frequencies collectively associated with each EEG frequency band. These bands are individually responsible for specific functions, as shown in Table 2. Based on the power distribution in each band, the ability of the brain can be measured, i.e., normal children will have good range of Delta, Theta, and Beta compared to the children with learning disability while performing a task. So, for the visual attention task, Delta, Theta, Alpha, Beta band measures are accounted through this technique, and two classes of children can be classified.

Frequency bandsRange (Hz)Responsible statesDelta0.1–4Attention (rest/awake)Theta4–8Memory and learningAlpha8–14Cognitive alertnessBeta14–32Cognitive processingGamma32–100Observation and active response

Bands and responsibilities.

Band spectrum analysis also helps segment a signal into independent frequencies to indicate the significance of the band power.

Statistical analysis

On the dataset, various statistical parameters are analyzed and visualized using tools such as coherence, granger causality, PSI, PDC, DTF, and others; the efficiency of connectivity among the brain's frontal and occipital lobes can be inferred with the help of these parameters, and those lobes are important because they play a major role in visual attention task, as the frontal lobe helps the child count the number of cartoon images in association with the occipital as it contributes by communicating the visual images information to the brain, i.e., the corresponding channels such as Fp1, Fp2, F3, F4, F7, F8, Fz, O1, and O2 signals are measured as mentioned in Figure 4. The Alpha band of these channels are statistically analyzed to classify and find the difference between children with learning disability and normal children because Alpha band is more associated for processing visual attention task, as shown in Table 2.

Representation of frontal and occipital lobes monitoring channels are the active channels indicated in green.

Coherence

Coherence quantifies the strength in association of two regions of the brain that focus on performing actions such as learning, responding, eating, etc. The coherence is represented as a matrix. A coherence matrix is the mathematical representation that provides the coherence between multiple regions. In the EEG dataset, the coherence matrix represents the linear dependency between the brain regions by comparing the signals acquired from each channel. The range of coherence is between 0 and 1, for a high coherence between two bands, it is interpreted by a value close to 1 and for a low coherence between two bands, it is interpreted by the value close to zero. The diagonal elements represent coherence with itself, and thus it is always 1. The effectiveness of the associative relationship between the frontal and occipital lobes is observed using the representation of the coherence matrix.

Granger causality

Granger causality is a statistical measure that determines whether a time-series-based signal can predict the future value of another time-series-based signal. This measure helps to understand the direct influence of a brain region on another region, which means that it depicts how well an activity performed at one region of the brain affects the other region of the brain. This aims to understand and perform specific cognitive skills and helps to identify the direction through which information flows among the brain regions. The range of granger causality varies from 0 to ∞. The lower the values, the lower the influence of the region over the other, and vice versa (Saleh, 2014). The value of transfer function Huu(freq) that represents the influence of channel v on channel u must not be zero to analyze the causal difference. If the transfer function is zero then no causal difference between channels will be observed. Here, granger causality is measured to ensure the influence of the occipital lobe on the frontal lobe and the flow of visual information to count the number of images. It is also represented as a matrix to display the effect of causality in frontal by occipital.

Suu(freq) is the PSD of channel u, Huu is the transfer function from v to u, and Σuu is the noise variance channel u.

Phase slope index (PSI)

To estimate both the magnitude and the direction of flow of information in the EEG dataset, PSI involves fast Fourier transform (FFT) method. It helps in measuring the temporal order between spatially separated signals, simply getting the phase linearity among two signals. The PSI is calculated as:

where Ψuv is the function of coherence between channel u and v, the frequency resolution is given as Δfreq, and the imaginary part of coherency is given as I. The coherency is defined as:

Where

Cvu(freq) is the coherency between v and u at a frequency freq.

Suu(freq) is the PSD of the u channel at frequency freq.

Svv(freq) is the PSD of the v channel at frequency freq.

Svu(freq) is the cross-spectral density between channels v and u at frequency freq.

The imaginary part symbolizes the role of the channels, that is, if Ψvu > 0 the channel with signal v drives and the channel with signal u responds to it. If Ψvu < 0 channel with signal u drives channel with signal v, i.e., –Ψvu the information flows in opposite direction. PSI is measured to know the trace of direction in which the information flows among frontal and occipital lobes by analyzing the phase relationship between signals of the corresponding channels which monitor the lobes.

Partial directed coherence (PDC)

PDC measures the flow of information intensity among the brain regions in the EEG data using a granger causality measure taken in the frequency domain. It is a direct connectivity that takes the measure of inverse of transfer function involved in granger causality. In this technique, directed functional connectivity is measured between regions of the brain. Quantifies the influence and strength of the connection between two regions. The range of PDC is between 0 and 1. If the values are closer to or equal to zero, it indicates that the influence of the brain region which channel j is monitoring over the region of the brain channel i handing is less. Similarly, if the values are closer or equal to 1, then the influence of brain region which channel v is handling is greater over the region channel u is responsible, that is, the strength of the connection and the influence capacity from occipital to frontal is analyzed to differentiate children with learning disabilities and normal children.

where PDCvu(freq) is the partial directed coherence, −Huv (freq) is the inverse transfer function that indicates the influence of a brain region channel v is handling over the region channel u at a frequency freq. The summation m = 1 represents the normalization of the influence in all channels.

Directed transmission function (DTF)

DTF determines the strength of a linear relationship between signals and the directionality of the relationship, and focuses on how much signal power from a brain region is transferred to another region which is monitored via channels at a particular frequency, i.e., occipital channels to frontal channels. It enables the construction of directed brain networks, providing insights into functional and structural connectivity. It is derived from granger causality and multi-vector autoregressive model (MVAR), given the transfer matrix of system H(f).

Granger causality,

MVAR model,

where U(t) is the MVAR model vector of signal for all signals at t with CoeffA as the coefficient of autoregressive “o” as the order of the model with “N” the vector of noise at time t.

The relation among the EEG data channels can be analyzed with the DTF information, including the phase relations between signals. The Directed Transfer function is given as

The above equation is the normalized version of DTF, whose input range is between 0 and 1 and produces a relationship between the inflow from the channel v region to the channel u region and all the inflows to the channel u region. The DTF describes at frequency freq the influence of channel u's region on channel v's region with m as the index for all channels.

Validation using machine learning algorithms

ML techniques are employed to validate the measures that we found using the analysis of brain parameter data, the EEG signal. The data is pre-processed in such a way as to train a particular algorithm of the model in order to predict and classify the results. There are two types of ML models: classification and regression. To validate, we make use of classification ML algorithms like SVM, LR, RF, KNN, and DT.

True positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) instances are represented in the heat map form so-called “confusion matrix” upon which the classification metrics are used. As a result of the model's working, the accuracy of the model's classification is calculated as the proportion of absolutely correct predictions out of all predictions made by the model. Estimation of the model's performance on different subset of dataset, i.e., splits of the dataset can be computed using cross-validation scores, thus helping to evaluate the quality and generalizability of algorithms on various sets of the data.

The data for both ADHD and control groups are loaded from CSV files. Each dataset contains PSD features (Alpha, Beta, Gamma, and Theta bands) that represent the brainwave communicating activity of the children in each group. The label column (“Label”) is added to each dataset, where 1 represents ADHD and 0 represents control, marking the target for classification. The datasets are then concatenated into a single dataset, combining the features of both groups. The relevant features for classification, which are the frequency bands, are selected for the model input. These features are assumed to capture distinct brain activity patterns that can differentiate ADHD and normal children.

Logistic regression (LR)

It is a ML algorithm of supervised learning, widely used for two class classification. LR is categorized as multinomial LR and ordinal LR. It assumes linear relationship among the independent features and the logits of the dependent variable. LR is a linear model that measures the occurrence of a binary outcome (ADHD or control) based on input characteristics. LR works by identifying a decision boundary that best separates the both the classes, such that minimizes the binary cross-entropy loss. It uses sigmoid function, a mathematical function whose curve is in an “S” shape and is used to map the predicted values to probabilities. Sigmoid function takes the independent features and produces a probability value between 0 and 1.

The standard scaler is given by,

where σ is the standard deviation.

For a given input X = , apply the multilinear function to the input x:

where W is the vector of weight with bias as b, and then apply the sigmoid function, the algorithm used for the logistic regression is discussed in Algorithm 1.

Require: Processed EEG Signal dataset , train-test split ratio, regularization strength C, optimization algorithm “solver”, number of iterations “max” iterBestlogisticregressionmodelβ*Ensure 1: Normalize and preprocess the dataset to get Xscaled2: 80-20 Dataset split applied on Xscaled for training and testing3: Define hyperparameter search space: • C arrow (Regularization strength) • solver arrow (Optimization algorithm) • max_iter arrow (Number of iterations for convergence)4: Initialize GridSearchCV (k-fold cross-validation with k = 5): • Instantiate GridSearchCV object: • grid_search arrow GridSearchCV(LogisticRegression (random_state = 42)), • Train the logistic regression model with hyperparameter tuning: • grid_search.fit(Xtrain, ytrain) • Get the best model after tuning: •β*arrow grid_search.best_estimator_ • Predict on the validation set: • ypredarrow β*(Xval)

Logistic regression with hyperparameter tuning using GridSearchCV on processed EEG signal dataset.

K-nearest neighbor (KNN)

This is a supervised ML algorithm which is non-parametric that is used for regression and classification tasks. It is based on proximity between the data points to predict or classify for grouping the data points. The KNN algorithm identifies the nearest neighbors for the given datapoint based on which a class label is assigned. The nearest neighbors are identified by calculating how close they are from the data point for which the distance metrics. The distance between data points are measured by Euclidean distance given as,

The Manhattan distance is also widely used to measure distance between two points by calculating the actual difference among the values of two data points. The Minkowski distance is like a generalized version of distance metrics like Euclidean, etc. The Hamming distance is used for Boolean or string vectors, identifying the points where the vectors do not match. Euclidean distance metric is used in the algorithm for this study which maintains the geometric relationship between the datapoints. The formal representation of the methodology used for the KNN is given in Algorithm 2.

Require: EEG Signal dataset , train-test split ratio, number of neighbors n, weighting function, and distance metricEnsure: Best KNN model β*1: Normalize and preprocess the dataset to get Xscaled 2: 80-20 Dataset split applied on Xscaled for training and testing 3: Define the hyperparameter search space: • n_neighbors arrow (Number of neighbors) • Weights arrow (Weighting function) • Metric arrow (Distance metric)4: Initialize GridSearchCV (k-fold cross-validation with k=5): • Initialize GridSearchCV: • grid_search arrow GridSearchCV (KNeighborsClassifier(), • Train the KNN model: • grid_search.fit(Xtrain, ytrain) • Obtain the best model: •β*arrow grid_search.best_estimator_ • Predict on the validation set: • ypredarrow β*(Xval)

KNN classifier with hyperparameter optimization using GridSearchCV on EEG signal dataset.

Support vector machine (SVM)

This works well on small and complex datasets. The ideology behind SVM is find a best fit decision boundary, also called as hyperplane that can separate and add n-dimensional data points into the classes. Support vectors of SVM fits the best hyperplane that are the extreme points that are nearer to the hyperplane which determines the hyperplane. Margin is the distance between hyperplane and support vector, and if the margin is large then it is a good fit of model as it will maximize the decision boundary among classes. The margin is of two categories: soft margin and hard margin. Hard margin finds a boundary that completely divides the data points belonging to different classes, such that it ensures maximum width possible. The equation of hyperplane to be optimized when the margin is hard is given below,

The soft margin fits a hyperplane in such a way that it allows some misclassification or margin violation. This is more suitable when anomaly is present in the data. The equation of hyperplane that is to be optimized is given as,

There are two types of SVM: linear (used when data are perfectly linear) and non-linear SVMs (used when data are non-linear). Non-linear SVMs are used to capture the non-linearity in the “Kernel Trick.” The kernel trick maps low dimensional space to high-dimensional space using mathematical functions, also called kernels. There are multiple types of kernels,

Linear kernel,

Polynomial kernel,

where U1 and U2 are feature vectors, and d is the degree of the polynomial sigmoid function,

RBF kernel maps the low dimensional data points to a high dimensional space with a non-linear function.

In this study, linear kernel is chosen to employ in the SVM model which helps to interpret the dataset by reducing the complexity through overfitting. The algorithm used for the SVM is given in Algorithm 3.

Require: EEG Signal dataset , train-test split ratio, regularization parameter C, kerneltype, and gamma coefficientEnsure: Best SVM model β*1: Normalize and preprocess the dataset to obtain Xscaled2: 80-20 Dataset split applied on Xscaled for training and testing3: Define hyperparameter search space: • C arrow (Strength of regularization) • Kernel arrow (Type of kernel) • Gamma arrow (Coefficient of kernel)4: Set up GridSearchCV (k-fold cross-validation with k=5): • Create GridSearchCV: • grid_search arrow GridSearchCV(SVC(), • , • Train the SVM model: • grid_search.fit(Xtrain, ytrain) • Get the best model: •β*arrow grid_search.best_estimator • Predict on the validation set: • ypredarrow β*(Xval)

SVM classifier with hyperparameter tuning using GridSearchCV on EEG signal dataset.

Decision tree classifier (DT)

It is applied very extensively to classification as well as regression problems in supervised learning. Dataset is repeatedly divided into subsets based on feature values to construct a model, where every internal node is a decision-made node over a feature, and every edge node is the result of the decision, and every leaf node is a terminal class label or output. The aim is to construct a model which forecasts the value of a target variable by discovering straightforward decision rules induced from the features of the data. Interpretability is one of the significant strengths of decision trees; rules produced by the tree are easy to visualize and comprehend, making them valuable in explaining model choices. They also have the capability to deal with both numerical and categorical data and need minimal preprocessing of data. Nonetheless, decision trees have a number of drawbacks, such as overfitting the training data, thus being less generalizable to new data. They are also prone to small changes in the data, which can cause instability in the model structure. In addition, they can be biased toward features with higher levels and can fail to capture complex interactions between features. These shortcomings can be overcome using ensemble techniques, i.e., RF and Gradient Boosting to merge several decision trees to achieve higher accuracy and generalization.

where Nc is the number of classes, and pi is the proportion of a specific class u at a node.

Here, pi is the probability of class u.

where nv is the count of sample in class v, and n is the total count of samples. The pseudocode used for the Decision Tree is presented in Algorithm 4.

Require: EEG Signal datasets for ADHD and Control, feature list , train-test split ratio, hyperparameter gridEnsure: Best Decision Tree model θ*1: Normalize and preprocess the dataset to get Xscaled2: 80-20 Dataset split applied on Xscaled for training and testing3: Define hyperparameter search space: • Criterion: • Splitter strategy: • Maximum tree depth: • Minimum samples for node split: • Minimum samples per leaf: • Feature selection strategy: • Class weight scheme: 4: Train and evaluate Decision Tree using GridSearchCV: • Setup GridSearchCV (k-fold cross-validation with k=5) • Fit the model on training data: grid_search.fit(Xtrain, ytrain) • Extract best estimator: θ*arrow grid_search.best_estimator_ • Predict on validation set: ypredarrow θ*(Xval) • Compute evaluation metrics: classification report, confusion matrix, and accuracy • Compute 5-fold cross-validation score using θ*

Decision tree classifier with hyperparameter optimization using GridSearchCV on EEG signal dataset.

Random forest classifier (RF)

This is a ML algorithm for supervised learning that is widely employed for classifying tasks and is based on decision trees. Decision trees is a classification ML algorithm that works based on a set of rules that separates the heterogeneous population of data points into homogeneous subgroups. Ensemble techniques make use of combination of models to increase accuracy. There are multiple types of ensemble techniques of decision tree, such as bagging, which the average of the predictions made by multiple classifier models is taken as the ultimate prediction. Boosting is a technique in which the weighted vote of the predictions of multiple classifiers is taken for the closing prediction. RF is an ensemble technique where multiple classifiers are trained on different subsamples of the data. This subsample of data is given by the Bootstrap technique where multiple subsamples of the data are taken, which resembles the same property as the original dataset, and it uses sampling with replacement. In eve

View original article

FRONTIERS IN SYSTEMS NEUROSCIENCE

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Quantitative investigation on working memory patterns through EEG based on visual attention task for children with learning disability

Comments (0)