Objective. This paper presents a novel domain adaptation (DA) framework to enhance the accuracy of electroencephalography (EEG)-based auditory attention classification, specifically for classifying the direction (left or right) of attended speech. The framework aims to improve the performances for subjects with initially low classification accuracy, overcoming challenges posed by instrumental and human factors. Limited dataset size, variations in EEG data quality due to factors such as noise, electrode misplacement or subjects, and the need for generalization across different trials, conditions and subjects necessitate the use of DA methods. By leveraging DA methods, the framework can learn from one EEG dataset and adapt to another, potentially resulting in more reliable and robust classification models. Approach. This paper focuses on investigating a DA method, based on parallel transport, for addressing the auditory attention classification problem. The EEG data utilized in this study originates from an experiment where subjects were instructed to selectively attend to one of the two spatially separated voices presented simultaneously. Main results. Significant improvement in classification accuracy was observed when poor data from one subject was transported to the domain of good data from different subjects, as compared to the baseline. The mean classification accuracy for subjects with poor data increased from 45.84% to 67.92%. Specifically, the highest achieved classification accuracy from one subject reached 83.33%, a substantial increase from the baseline accuracy of 43.33%. Significance. The findings of our study demonstrate the improved classification performances achieved through the implementation of DA methods. This brings us a step closer to leveraging EEG in neuro-steered hearing devices.
Lacking the capacity to select and enhance a specific sound source of choice and suppress the background, hearing aids generally amplify the volume of everyone in the environment. This presents a significant challenge, as developing computational models that replicate the human brain's ability in effectively suppressing unwanted sounds in noisy environments would be extremely advanced. Over the last few years, intelligent hearing aids have become better at suppressing background noises (Andersen et al 2021). However, the problem of knowing which speaker to enhance, famously known as the cocktail party problem (Cherry 1953), is unsolved and most people with hearing aids still experience discomfort in noisy environments (Han et al 2019). One solution to this problem is to classify (i.e. decode) attended and unattended sounds from the brain signals, using a group of methods referred to as auditory attention decoding (AAD) (Alickovic et al 2019, Geirnaert et al 2021b).
Using AAD methods, it was reported that there is a significant difference in the cortical speech representation depending on whether the sound source was attended or unattended (Ding and Simon 2012, Mesgarani and Chang 2012, O'Sullivan et al 2015, Mirkovic et al 2016). The cortical activity can be detected by several methods, such as invasive intracranial electroencephalography (iEEG) as in (Mesgarani and Chang 2012, Golumbic et al 2013) or noninvasive magnetoencephalography as in (Ding and Simon 2012, Akram et al 2016) and electroencephalography (EEG; wholes-scalp EEG as in O'Sullivan et al 2015, Etard et al 2019, Aroudi and Doclo 2020 and in/around-ear EEG as in Mirkovic et al 2016, Fiedler et al 2017, Nogueira et al 2019, Hölle et al 2021). However, many advantages of EEG make it the most prevalent method for AAD. EEG is a cheap and widely available technique where the signals are picked up by several small electrodes placed on the head. Unlike some other methods, an EEG recording is able to capture both the radial and the tangential components of the signal, which makes the method effective and accurate. There are however some disadvantages with EEG, mainly that it has limited spatial resolution and low signal-to-noise ratio. The EEG-based AAD has become an important tool in the research for tackling the cocktail party problem in hearing aids (Alickovic et al 2020, 2021, Lunner et al 2020).
The EEG dataset used in this paper reflects complex, real life situations. Subjects were instructed to attend to one of two different fictional stories, narrated by one female and one male voice, and presented simultaneously. We considered so-called locus-of-attention (LoA) classification, i.e. decoding whether the attended speech was coming from the listener's left or right side from EEG signals. A common approach for distinguishing between attended and unattended sound sources in EEG data is classification using machine learning (ML). Given a training set, the ML algorithm first learns how to classify the data, and it can thereafter be used on new data. Naturally, a large training dataset increases the accuracy of the model and decreases the risk of overfitting. One major issue with classifiers created from EEG recordings is the data shifts occurring between trials and sessions for the same subject. The measurements are also subject dependent, meaning that data shifts also occur when comparing EEG-measurements from different subjects. These shifts are due to factors such as instrument imperfectness (e.g. jitter) or human factors (e.g. misplacement of electrodes), and a classification made for subject A might not work well on data from subject B. Hence, the classification algorithm would need to be constructed from scratch for each new subject. This is time-consuming and not realistic in real-time situations.
The overfitting problem is common to all model estimation methods. ML methods are however often over-parameterized and thus more sensitive in this regard, and the algorithms tend to learn patterns in the data which could be due to measurement variations or noise. This is common for small training datasets, and it often results in overfitted models that does not generalize well over other datasets (Mutasa et al 2020). When working with EEG measurements, the dataset size is constrained by the number of trials each participant is able to carry out without loosing focus or altering the measurement setup. Thus, there is a risk of overfitting the model, which compromises its reliability.
Another aspect of a limited dataset is that good data cannot always be obtained from all subjects. The signals can be noisy, electrodes might be misplaced, or the subjects may lack sufficient focus during the attention task. The main focus of this paper is to investigate whether combining poor data from one subject with good data from other subjects can enhance the auditory attention classification performances.
This paper focuses on transfer learning, more specifically a domain adaptation (DA) approach, to answer the research question above. DA is a specific field in ML where the source data distribution (subject A) is different from the target data distribution (subject B) (Weiss et al 2016). It has previously been used on EEG data from i.a. motor imagery tasks such as movements of hands, feet and tongue (Yair et al 2019, 2020); emotion recognition (Bao et al 2021) and working memory (Chen et al 2021). However, DA has not yet been used on EEG data to decode (i.e. classify) auditory attention. We primarily focused on the parallel transport (PT) method (Yair et al 2019) which is based on covariance matrix computations on the Riemannian manifold of positive definite matrices. The full MATLAB-code for the results is available at GitHub 6 . The main objective of this paper is to evaluate whether a DA method that solely relies on EEG data can be used for AAD to further improve decoding (i.e. classification) performances. The presented problem is LoA with the aim to improve the performances for subjects with initially low classification accuracy, overcoming challenges posed by instrumental and human factors. The findings of our study demonstrate the improved classification performances achieved through the implementation of DA methods.
1.1. Paper outlineThis paper is structured as follows: section 2 explains Riemannian geometry, and section 3 describes LoA classification with EEG. DA and PT are explained in section 4. The experimental setup, preprocessing of the EEG data, model evaluation method and statistical analysis are discussed in section 5. Results and discussions are presented in section 6. Lastly, section 7 sums up the paper with conclusions. A pipeline from data to classification accuracy is provided in appendix A and a more in depth explanation of PT is presented in appendix B.
The DA method used in this work relies on covariance matrices and Riemannian geometry. By definition, stated in section 2.1, covariance matrices are symmetric positive definite (SPD) and capture linear relations in the data. The relative easiness with which they can be computed have caught researchers interest when working with complex, high-dimensional datasets such as EEG recordings (Vidyaratne and Iftekharuddin 2017, Yair et al 2019, 2020).
The use of Riemannian geometry will be motivated through a simple example where two points are plotted in . Naturally, the minimal Euclidean distance between these two points is easy to compute as a straight line. Now, let each point be a
covariance matrix. Computing the minimal distance between two such SPD matrices has been proven to yield problems when utilizing Euclidean geometry (Yair et al
2019). One drawback is referred to as the swelling effect, where the original determinants are smaller than the Euclidean average determinant (Arsigny et al
2006, Yger et al
2017, Lin 2019). Another problem is due to the non-complete space for SPD matrices when using Euclidean geometry (Fletcher et al
2004). A third issue is the computational approximations, where traditional algorithms for large datasets relying on Euclidean geometry tend to yield unreliable results (Sommer et al
2010).
Mentioned drawbacks might be alleviated using Riemannian geometry. It explores the shapes on a curved space, such as the surface of a cylinder, sphere or cone, and has demonstrated effectiveness when working with SPD matrices such as covariance matrices (Fletcher et al
2004, Arsigny et al
2006, Sommer et al
2010). Again referring to the two covariance matrices plotted as points in . Their positivity constraints span a cone manifold in which both points lie strictly inside (Yger et al
2017, Mahadevan et al
2019, Yair et al
2019). The Riemannian distance between the points, further explained in section 2.2, is curved, which has the benefit of reducing the impact of the swelling effect (Arsigny et al
2006, Yger et al
2017).
The effectiveness of using covariance matrices with Riemannian geometry has been established in EEG analysis (Congedo et al 2017, Kalaganis et al 2022) and has previously been applied with success to classify the directional focus of auditory attention in the LoA classification problem (Geirnaert et al 2021a).
2.1. Covariance matricesThe covariance matrix is defined as:
where is the recorded EEG time series for subject s and trial i. Each trial in the EEG dataset used in this study is structured in a t × d matrix, where t is the number of samples and d is the number of EEG channels. Each element in the d × d covariance matrix
describes the covariance between the corresponding channels. Preprocessing of the EEG data is further explained in section 5.2.
One way to describe the curvature of the Riemannian manifold is through a so-called sectional curvature. The manifold has a tangent space
at the point
, and the sectional curvature is defined by the point P and a two-dimensional subspace of the tangent space. Hence, the sectional curvature depends on two linear independent tangent vectors, and it is therefore possible to view the (symmetric) covariance matrices on the Riemannian manifold as vectors in the Euclidean tangent space. These vectors are used as features in the classification, further explained in section 3. In this paper, the point P is a covariance matrix and S is the vector in Euclidean space. The shortest path between two covariance matrices
on a Riemannian manifold is called a geodesic curve, and it is given by;
The length of the curve above, referred to as the Riemannian distance, is unique and is given by (Yair et al 2019);
where is the Frobenius norm,
is the matrix logarithm and
is the ith eigenvalue of P.
The Riemannian mean Ms for subject s, also referred to as the Fréchet mean, is the sum of distances to the point on the manifold that is closest to all the subject points (Yger et al 2017, Yair et al 2019):
where is the Riemannian distance defined above. In this paper, each subject point is a covariance matrix and the Riemannian mean is therefore a symmetric matrix, which can be interpreted as finding the center of mass in a high-dimensional Riemannian geometric figure. The Riemannian mean of two covariance matrices
is the midpoint
in equation (2). The Riemannian mean for more than two covariance matrices can be computed by an iterative algorithm 1 developed by Barachant et al, where
and
are defined in (B.1) and (B.2).
The main goal of this study is to accurately decode (i.e. classify) the direction (left or right) of attended speech using EEG data, a technique known as LoA classification. Previous research (Geirnaert et al 2020, Cai et al 2021, Li et al 2021, Vandecappelle et al 2021, Su et al 2022, Puffay et al 2023) has shown promising results in this area. Unlike stimulus reconstruction (SR) methods that use regression techniques (Mesgarani and Chang 2012, O'Sullivan et al 2015, Alickovic et al 2019, Geirnaert et al 2021b) to reconstruct and classify sound stimuli, LoA methods focus on classifying the direction of the attended sound. In this study, we propose using DA to align less reliable data with more reliable data by transporting poor data from one subject to the domain of good data from different subjects, aiming to enhance LoA classification performances.
3.1. Classification methodsThis paper focuses on EEG-based locus of auditory attention classification (left vs. right). The classification method involves four steps: (1) computing the covariance matrix, (2) projecting it onto the tangent plane of the Riemannian manifold, (3) vectorizing the covariance matrix to create a feature vector, and (4) employing a linear support vector machine (SVM) for classification. Although we also explored four alternative classification methods—k-nearest neighbor, regression tree, decision tree, and neural network with various configurations—the regression and decision trees yielded unsatisfactory results, and the neural network exhibited a similar accuracy to SVM but with considerably longer training time.
While conventional methods like temporal response functions (TRFs), canonical correlation analysis (CCA) or match-mismatch are commonly used for classification, our paper explores an alternative approach using vectorized covariance matrices as the data features in the SVM classifier. This choice offers several benefits. First, our approach is audio-free, addressing challenges when audio data is not available. Second, computing covariance matrices is relatively straightforward as they are based on and capture linear relations. Third, past research (Yair et al 2019, 2020) demonstrates the effectiveness of SPD matrices (such as covariance matrices), particularly with simpler classifiers like SVMs, which are less complex and computationally more efficient compared to deep neural networks. Covariance matrices have been successfully used as features in diverse fields such as medical imaging, ML and computer vision (Tuzel et al 2008, Sra and Hosseini 2013, Freifeld et al 2014, Bergmann et al 2018). In the context of physiological signal analysis and medical imaging, Riemannian geometry of covariance matrices has been utilized (Pennec et al 2004, Barachant et al 2013).
In this section, we shortly introduce transfer learning, define domain adaptation (DA) and explain parallel transport (PT).
4.1. NotationsThe training data comes from a source domain
and its predictive learning task is denoted
. The testing data
, as well as all other data used after the training, comes from a target domain
and its predictive learning task is denoted
. All notations are presented in table 1 (Weiss et al
2016).
Table 1. Notations used in this section (Weiss et al 2016).
NotationDescriptionNotationDescription
Comments (0)