Balancing Between Privacy and Utility for Affect Recognition Using Multitask Learning in Differential Privacy–Added Federated Learning Settings: Quantitative Study


IntroductionBackground

Imagine awakening in the morning to find yourself dealing with stress, perhaps stemming from a disagreement with a friend, a pressing financial concern, or an unresolved work–related matter. In this scenario, an advanced digital assistant, akin to Siri or Alexa, seamlessly detects the user’s stress level through the analysis of physiological signals captured by wearable technology. Leveraging this data, the digital assistant dynamically adjusts its language and tone to suit the user’s mood. Moreover, it proactively recommends personalized relaxation techniques, drawing from past successful interventions such as yoga, mindfulness practices, listening to favorite music, and watching uplifting videos. Such a sophisticated stress recognition and intervention system holds promise for enhancing the well-being and emotional resilience of individuals. Although most of the studies use facial expressions [] and speech [] for recognizing stress and emotions (ie, affects as an umbrella term), physiology-based methodologies also emerged as an alternative since they can offer promise for seamless, continuous monitoring within everyday contexts. Notably, 330 million smartwatches, fitness trackers, and wearables were sold in 2020, which are capable of gathering quantitative physiological data []. They are promising tools for affect recognition due to their unobtrusive data collection capabilities.

Multitask learning (MTL) is a method proposed to tackle training multiple related tasks at the same time. It works by sharing knowledge between these tasks to make each model perform better []. Essentially, MTL acts like a behind-the-scenes helper, enhancing the ability of machine learning (ML) models to generalize different types of data []. This technique has been particularly useful in improving ML models across various fields, including affective computing []. For instance, if we want to improve how a system recognizes emotions, we can also train it to recognize gender []. By doing this, the system can learn from both tasks simultaneously, making it more effective in recognizing emotions and enhancing performance. The motivation for using identity and emotion has also a psychological basis. Connolly et al [] examined whether facial responses are shared between identity and emotion. They showed that there is a strong positive correlation between face emotion recognition and face identity recognition tasks, and the 2 recognition tasks share a common processing mechanism. Sheng and Li [] also confirmed the dependence of identity and emotion tasks by using a gait signal. Inspired by the MTL idea and previous studies, we used MTL to perform each task separately, stress recognition and identity recognition, respectively, by using physiological signals. However, there is a concern when third parties gain access to additional information, like gender, identity, or age, which could be misused for targeted advertising or even discrimination in things like job opportunities []. Moreover, if sensitive information like biometric data gets leaked, it could lead to cybercrimes such as identity theft. Therefore, it is crucial for MTL systems to incorporate privacy-preserving methods.

Researchers have introduced federated learning (FL) as a way to address privacy issues. FL allows users to train their data locally while sharing only the trained model parameters rather than the original data itself. The philosophy behind FL is to bring code to the data instead of moving the data to the code []. This method complies with privacy regulations such as the European Union’s General Data Protection Regulation [] and the California Consumer Privacy Act []. FL has shown promise in safeguarding user privacy in Internet of Things networks.

Although FL is a significant step forward in protecting user privacy, it is not invincible. There is still a risk of sensitive information being uncovered by reverse engineering the local model parameters. To further enhance privacy, researchers have integrated privacy-preserving techniques into ML for physiological data. One such method is differential privacy (DP), which adds random noise to each model in the client or server, disrupting updates and limiting the leakage of sensitive information between nodes []. However, it is worth noting that DP can reduce the performance of ML models.

In practice, protecting the privacy of users without degradation in model utility is still an open problem. In this study, we first implemented an MTL architecture for recognizing stress and identity from multimodal physiological signals. We further added FL and DP mechanisms to preserve privacy. To obtain a robust performance, we separated the 2 tasks and added noise to only the privacy task, which is biometric identity recognition. In this way, we were able to improve the stress recognition performance with the help of MTL and hide identity information by adding noise to the identity task model by using DP. To the best of our knowledge, this study is the first MTL-based affect recognition study using FL and DP to preserve privacy at the same time.

In the next section, we mentioned the related works using MTL and FL for affect recognition. We then presented our approach in the Methods section. In the Results section, we compared multitask centralized, decentralized FL, and decentralized FL with DP approaches on selected public datasets for recognizing stress levels and identity of users. We concluded the study with the lessons learned, limitations, and future works in the Discussion section.

Related Works

To develop a robust affective computing system that can be used in practical applications, researchers tested various modalities with state-of-the-art deep learning techniques. MTL has also raised significant attention from various domains over the past few years, including affective computing. Chen et al [] applied it to audiovisual signals in the Audio/Visual Emotion Challenge and achieved a significant performance increase compared to baseline ML algorithms by detecting arousal and valence levels simultaneously. Since the human face source can be used for several tasks, such as gender recognition or age estimation, Sang et al [] used MTL with convolutional neural network (CNN) for smile detection, emotion recognition, and gender classification.

They outperformed the state-of-the-art techniques in selected 3 benchmark datasets (Internet Movie Database and Wiki dataset, GENKI-4K dataset [a dataset containing images labeled with facial expressions, commonly used in emotion recognition research], and Facial Expression Recognition Challenge-2013 dataset). MTL-based centralized learning (CL) for affective computing is beneficial for simultaneously sharing features and learning auxiliary tasks. However, when private tasks (ie, face, gender, and person detection) are included in MTL to improve affect recognition performances, the models create the risk to reveal this sensitive information to possibly malicious parties especially while uploading the features to the central server for both utility and auxiliary tasks for learning.

Data privacy has become an issue of great concern in affect recognition using either verbal or nonverbal data, as gender, age, and identity of the user can be revealed in the process. FL is proposed to preserve privacy while taking advantage of ML and has attracted significant attention from various domains over the past few years, but affective computing research and applications on emotion-related tasks are rarely discussed. Most existing works are conducted on private datasets or in limited scenarios, making it difficult for researchers to compare their methods fairly [].

FL has been widely applied for facial features for affect recognition. Somandepalli et al [] investigated FL for 2 affective computing tasks: classifying self-reports and perception ratings in audio, video, and text datasets. Using the speech modality, Latif et al [] investigated an FL approach in emotion recognition tasks while sharing only the model among clients. They implemented a long short-term memory classifier and tested it on the interactive emotional dyadic motion capture dataset with 4 emotions: happy, sad, angry, and neutral. Face and speech modalities were also often combined to get a more robust performance []. Chhikara et al [] combined emotion recognition from facial expressions and speech with an FL approach. For the face modality, they used a combination of CNN and support vector machine models, whereas for the audio modality, they applied a 2D CNN model to extract spectrogram images. Their proposed framework has been validated and tested on 2 datasets, Facial Expression Recognition 2013 for facial emotion recognition and Ryerson Audio-Visual Database of Emotional Speech and Song for speech emotion recognition, respectively.

On the contrary, FL has been seldom used in the context of affect recognition from physiological signals, as indicated in . Can and Ersoy [] implemented an FL learning model to forecast perceived stress using physiological data. Each subclient uses an multilayer perceptron classifier to locally train its data on the edge, and the sharing of individual updating parameters of multilayer perceptron is facilitated using the federated averaging (FedAvg) algorithm. FL has also been extended to handle multimodal physiological signals. Nandi and Xhafa [] proposed Federated Representation Learning for Multi-Modal Emotion Classification System, an FL framework-based ML model for emotion recognition. They applied wavelet feature extraction and a neural network to electrodermal activity (EDA) and respiration data from the Dataset for Emotion Analysis using Physiological signals to recognize valence arousal levels, validating their approach. These studies demonstrate the successful application of FL without compromising affect recognition performance.

Table 1. Studies using MTL or privacy-preserving approaches for various applications.StudyApplicationSignalMTLFLDPMultimodalityZhao et al []Network anomaly detectionNetwork signals✓✓Smith et al []Activity recognitionAcceleration, gyroscope✓✓✓Somandepalli et al []Affect recognitionSpeech, video, text✓✓Sang et al []Affect recognitionFace images✓Shome and Kar []Affect recognitionFace images✓Chhikara et al []Affect recognitionFace and speech✓✓Feng et al []Affect recognitionSpeech✓✓Can and Ersoy []Affect recognitionPPG✓Nandi and Xhafa []Affect recognitionEEG, EDA✓✓Wang et al []BiometricFace
speaker✓✓✓This studyAffect recognitionPPG, EDA, ACC, ST✓✓✓✓

aMTL: multitask learning.

bFL: federated learning.

cDP: differential privacy.

dPPG: photoplethysmography.

eEEG: electroencephalography.

fEDA: electrodermal activity.

gACC: acceleration.

hST: skin temperature.

While FL has been introduced to enhance model training in terms of privacy, the privacy vulnerabilities inherent in the stochastic gradient descent (SGD) algorithm remain unresolved. DP mechanisms have primarily been discussed in FL settings, involving the injection of noise into each model client or server to perturb updates and limit gradient leakage shared among nodes (ie, client and server) []. In one of the initial applications, authors introduced a new private training method termed differential private SGD, which reduces local and global gradient information leakage between the client and server. Instead of using the standard composition theorem to calculate the final distribution of overall noise clients, they used a moments accountant metric to adaptively monitor the overall privacy loss. Recognizing that servers are often curious or untrustworthy, Wei et al [] proposed a local DP mechanism algorithm by introducing Gaussian noise distribution into user models before uploading them to servers. To address the communication overhead required for optimal convergence upper bound for DP, they introduced a novel approach known as communication rounds discounting method, which achieves a better trade-off between the computational complexity of searching and convergence performance. DP has also been leveraged for affect recognition. Feng et al [] used user-level DP to mitigate privacy leaks in FL for speech emotion recognition. Recently, Smith et al [] showcased promising performance in preserving privacy via multitask federated learning (MFL) for activity recognition.

When examining the affective computing literature, we find studies using MTL to improve performance and others using FL or DP to preserve privacy. However, there are no studies that leverage MTL for performance enhancement while also preserving privacy in this literature. The overall idea of our approach is to separate the utility and the privacy tasks preventing DP from compromising the performance of the utility task (affect recognition) and introducing noise exclusively to the privacy task (biometric identity recognition). Consequently, our work represents the first MTL-based approach to affect recognition using FL and DP to preserve privacy at the same time.


MethodsOverview

Our proposed framework is divided into 3 main substeps: feature extraction, FL model, and FL with DP settings (see for the block diagram).

Figure 1. Block diagram of the proposed MTL system. CNN: convolutional neural network; MTL: multitask learning. Ethical Considerations

This study did not require formal ethics approval, as it exclusively used publicly available datasets, which were obtained from open-access repositories with no personally identifiable information. According to the General Data Protection Regulation, studies involving the use of anonymized, publicly accessible data that do not engage human participants directly are exempt from ethics review. Therefore, no ethics approval was sought or required.

Data Description and Feature Extraction

For our experiments, we use 2 datasets, wearable stress and affect detection (WESAD) [] and virtual environment for real-time biometric interaction and observation (VERBIO) [], which have been created for affective state monitoring.

WESAD Dataset

In the WESAD dataset, each participant recorded physiological signals such as blood volume pulse, electrocardiogram (ECG), EDA, electromyogram, respiration, body temperature, and 3-axis acceleration measured from the chest and wrist using Plux RespiBAN and Empatica E4 devices. In total, 15 (12 male and 3 female) people participated in this experience. There were 4 states: baseline, amusement, stress, and meditation. More details can be found in Schmidt et al [].

VERBIO Dataset

The VERBIO dataset [] consists of biobehavioral responses and self-reported data during public speaking presentations, both in front of real-life and virtual audiences. In total, 55 participants conducted 10 distinct presentations, each lasting approximately 5 minutes, across the 3 segments of the study, spanning 4 days: PRE (1 session, day 1), TEST (8 sessions, days 2‐3), and POST (1 session, day 4). The PRE and POST segments involved real-life audiences, while the TEST segment featured various virtual audiences. To prevent participant exhaustion, the TEST segment was divided into 2 days, each comprising 4 sessions. In total, the study yielded 10,800 minutes of acoustic and physiological data from 82 real and 216 virtual reality presentations. In total, 35,174 segments were created. We solely used physiological data collected using the Empatica E4 device, possessing identical sampling rates and modalities as those used in the WESAD dataset.

Preprocessing

Each modality of the signal is segmented using 700 sample windows size with 50% overlap, as suggested in the literature []. To maximize the correlation among interparticipants and minimize among participants, these segments were further processed for extracting features [] such as mean, variance, root mean square, frequency domain features, average first amplitude difference, second amplitude difference, skewness, kurtosis, and entropy as a nonlinear feature.

Decentralized MFL Model

For privatizing the user identity while preserving stress recognition accuracy, we adopted an MFL approach that can effectively improve the performance of stress recognition while limiting the risk of inferring sensitive information from the training model since the client does not want to be exposed to the cloud service provider. MFL architecture-based stress recognition is developed as follows:

The dataset is partitioned into K clients. The data size of all the clients is the same. The client distribution is also assumed as independent and identically distributed (IID) and no IID [].For the local training process, there is only 1 iteration for SGD local training for each client. In particular, w is the local model parameter [] given by:wUDi=argminwU(FU(wU)+μ2‖wU−w(Di−1)‖2)(1)where μ is the learning rate and FU is the local loss function of the kth user.The local data of different clients cannot be communicated, and only the models can be shared.Following the FedAvg algorithm [], the server uses a global averaging approach to aggregate all local training models to compute the final global model. Formally [], the server aggregates the weights sent from the K clients as follows:w=∑U=1K‍piwUDi(2)where wi is the parameter vector trained at the kth client, w is the parameter vector after aggregating at the server, K is the number of participating clients, Di is the dataset size of each participating client, D=UDi is the whole distributed dataset, and Pi=|Di|/|D|.The global training epoch is set to M rounds (aggregations). The server solves the optimization problem []:W*=argminwU∑U=1M‍PUFUwU,Di(3)where FU is the local loss function of the kth client. Generally, the local loss function is given by local empirical risks.Decentralized FL With DP

In conventional FL, the global model is computed through averaging over model client participants, which performs better within homogeneous FL settings. However, using inference or adversarial attack, this shared model may contain sensitive and private information such as gender, age, and biometric template user. In such cases, the MFL framework is required to reduce the leakage of the black box gradient exchanged model. To overcome this limitation, researchers have used the DP scheme to protect either local or global data training FL model. However, the perturbed gradient using DP with a low budget has high variance, which leads to worse performance and slower convergence. We also compared adding noise to whole model and task-specific last layers in addition to the shared layers and demonstrated the performance of these 3 different approaches (). Motivated by personalized FL [], our work focuses on client-level privacy, which aims at a private specific layer of the client model rather than perturbing the entire whole local model. This is because the base layers are mostly redundant information, while the most important information that holds private and public information is located in the upper layers. To meet the utility privacy trade-off, the DP mechanism is used to perturb the gradients using Gaussian noise at a specific layer or task. Here, we use all steps in the FL model except step 4, that is, before uploading the local SGD model client to the global server, we inject an amount of noise to the updated local parameters. In that sense, we perturb the local gradient training inference with two kinds of noise distributions:

An additive Gaussian noise η∼N (0, σ) to each weight. This operation can be mathematically described as follows:wt+1=wt+η(4)A set of noise distributions can be sampled from the DP mechanism. A randomized mechanism M on the training set with domain X and range R satisfies (ϵ, δ) − DP for 2 small positive numbers and if the following inequality holds []:Pr[M(x)∈S]≤eϵPrMx`∈S+δ(5)where x and x`∈X are 2 input neighbor datasets, and S⊆R, where R is the set of all possible outputs, δ is privacy loss or failure probability, and ϵ is privacy budget.

An ideal DP mechanism provides a lower value of δ and a smaller value of ϵ. Unfortunately, these values decrease the function utility (eg, accuracy metric), so the main question is how much DP values we must perturb its output while guaranteeing trade-off privacy-utility. Intuitively, an output perturbation mechanism takes an input x and returns a random variable s(x). This operation can be modeled by:

M(x)=s(x)+Nσ(6)

where N is scaling noise sampled from a specific distribution. In this work, we chose Laplace and Gaussian mechanisms [] that use L1 and L2 norm sensitivity, respectively. The sensitivity function can be expressed as:

Δf=maxD,D`s(x)-sx`1,2(7)

Scaling noise can be computed as:

σ=Δf/ε(8)

Output perturbation satisfies (ϵ,δ)-DP when we properly select the value scaling noise. Thus, it sampled from Laplace and Gaussian distributions [] as:

MLaplace(x,f,ε,δ)=s(x)+Lap(μ=0,b)(9)MGaussian(x,f,ε,δ)=f(x)+Nμ=0,σ2(10)

The gradient information leakage can be reduced by applying a gradient thresholding or clipping algorithm. As explained by Abadi et al [], gradient clipping is crucial in ensuring the DP of FL algorithms. So, each provider or client model update needs to have a bounded norm, which is ensured by applying an operation that shrinks individual model updates when their norm exceeds a given threshold. Clipping impacts of an FL algorithm’s convergence performance should be known to create FL algorithms that protect DP.


ResultsOverview

Three scenarios are created to tackle the aforementioned challenges with the DP learning approaches: centralized, decentralized FL, and decentralized FL with DP. Their performances are evaluated on WESAD and VERBIO datasets on 2 different tasks. The first task is identifying users from a set of registered and recorded users. The second task is perceived binary stress recognition, which tries to distinguish the user’s stress level, stress versus nonstress. We train a multitask deep learning model for handling these tasks simultaneously. In addition, for better training and to avoid overfitting, an early stopping regularization technique is used as gradient descent. The accuracy metric is used for measuring identification and stress recognition performance. In each simulation scenario, we run 5-fold cross-validation, where each fold is tested based on the training of the other 4. Inspired from a successful architecture [] and as described in , the multitask 1D-CNN model is based on 3 convolutional layers, a pooling layer, 2 fully connected layers, and 2 linear classifiers to classify the studied tasks. The multitask model uses the cross-entropy loss function and SGD learning rate (β=.0005).

Table 2. Hyperparameters of the multitask 1D-convolutional neural network–based architecture.LayerTypeHyperparametersInputInputFeatures sizeConv1DConvolutionInput=1, output=20, K=8, stride=1, paddingReLuActivation functionReLuPoolingPoolingStride=2, max poolingConvConvolution layerInput=20, output=40, K=8, stride=1, paddingReluActivation functionReLuPoolingPoolingStride=2, max poolingConvConvolution layerInput=40, output=60, K=8, stride=1, paddingReluActivation functionReLuPoolingPoolingStride=2, max poolingFully connected 1Fully connected layerInput=360, output=100Fully connected 2Fully connected layerInput=100, output=300Linear 1Input: 100, output: 2Output=2 affect classes, activation function:linearLinear 2Input: 300, output: NOutput=N identity classes, activation function:linear

For each target task, the individual loss is determined by the cross-entropy for both stress recognition (Loss1) and identification tasks (Loss2). The individual losses are summed and form the total cost function (LossT).

CL Approach

Most existing stress recognition studies in conventional centralized ML settings use WESAD to position their works against state of the art [-] and achieve over 95% accuracy for binary stress recognition. Similarly, we used the CL approach on the WESAD dataset as a baseline experiment. We measured the performance of stress levels for each participant independently and then computed the average performance accuracy. Each individual participant model is trained using 5-fold cross-validation.

shows the results of our CL approach using a 1D-CNN multitask model for stress and identity recognition tasks on 2 datasets: WESAD and VERBIO. After training, the output layers are used to infer the stress level and the identity of the user, resulting in quite similar average scores of 99%. The results reveal a potential information leakage in this case, wherein model accuracy is preserved at the expense of the user’s privacy. As a result, this approach fails to ensure the users’ privacy since their data are transmitted to the server for training purposes.

Multitask Decentralized FL Approach (MFL)

Here, K (ie, participating clients) training models were created to train the whole dataset and the size of local samples Di=1000. We set the number of training epochs (communication round) to T=40 and local training epochs to 1. Overall, the average accuracy result achieved is 97% and 95% for stress mood recognition and 93% and 90% for user identification on WESAD and VERBIO, respectively.

To examine the effect of client participation within the MFL model, we tested different numbers of clients, that is, K=5, K=10, and K=20. As reported by Wang et al [] and confirmed in , an increasing number of clients and more client participation provide better performance for MFL training. The client distribution is different in assessing the MFL model in real-world conditions. We compare the convergence performance of the MFL model under IID and no IID for both tasks: stress recognition and subject identification (). We note that the data distribution dramatically affects the quality of the FL training and obviously affects MFL’s convergence performance. IID is a more idealistic case, whereas no IID is a more realistic case. Therefore, we expect the result to be lower in no IID cases. The performance difference proves the importance of data distribution.

Adjusting FL hyperparameter setting results can achieve a better performance than the CL approach. However, it may lead to a lower privacy level. As a result, SGD training may still reveal sensitive information about the client while exchanging the ML model with the global server.

Figure 2. The impact of the user participant size (various number of clients) on the multitask federated learning performance (wearable stress and affect detection). Figure 3. The impact of IID versus no IID distribution on the multitask federated learning performance. IID: independent and identically distributed. MFL With DP ApproachTo highlight the benefits of our proposed approach, we examine the impact of injecting noise into the local client training network according to these 3 scenarios: the full layers, shared layers, and task-specific layers (). The used noise is sampled via the following mechanisms:With DP technique–based Laplace and Gaussian mechanisms, the noise scale is drawn from the output perturbation mechanism. DP parameters are computed at each local training round to generate appropriate noise injected from specific distributions (ie, Laplace and Gaussian). Besides appropriate (ϵ,δ)−DP initialization, there are a few hyperparameters to be tuned, such as the number of clients N, the number of maximum communication rounds T, and the number of chosen clients K.Laplace distribution [] is computed as:σ=Δfϵ(11)Gaussian distribution—2 distributions are given by Abadi et al [] and Schmidt et al [], respectively:σ1=2log1.25δε(12)σ2=Δf2qTlog1δε(13)

 where Δf=2CUi, q=KN, and C = clipping threshold. We set the clipping factor to 1 and δ to 0.00001.

- show the accuracy comparison when adding Gaussian distribution levels into local training according to the 3 scenarios on the WESAD dataset. Compared to the baseline scenario, that is, no private mechanism, the perturbing share layer scheme with σ=0.1 and σ=0.3 only provides better results for utility tasks; however, the identification task reached an accuracy of around 86%.

Figure 4. Decentralized multitask federated learning approach results under different Gaussian noise levels (σ=0.1). Figure 5. Decentralized multitask federated learning approach results under different Gaussian noise levels (σ=0.3). Figure 6. Decentralized multitask federated learning approach results under different Gaussian noise levels (σ=0.6).

In this case, the amount of noise drawn from the Gaussian distribution is used to balance the utility and privacy and does not consider the FL settings parameters. The Gaussian levels are set to σ=0.1, σ=0.3, and σ=0.6. For instance, increasing Gaussian noise leads to poor performance, as depicted in .

To examine the DP impact on the utility-privacy trade-off, we assessed the performance of MFL with a DP mechanism in WESAD and VERBIO datasets under the aforementioned scenarios. The DP budgets are set as follows ϵ=1, ϵ=15, ϵ=50 and ϵ=2000 for this experiment. As depicted in - compared to the baseline scenario, that is, no privacy enhancing mechanism, adding DP noise into both shared and specific layers provides better results for utility performance; however, in terms of privacy, the perturbing specific task layer scheme provides better results than the perturbing shared layer. Results show that FL with perturbing all layers slows the convergence compared to others, although it provides better privacy (ie, decreasing identification accuracy).

Figure 7. Multitask federated learning performance under the differential privacy mechanism “different layer perturbations” (wearable stress and affect detection, ϵ=1). Figure 8. Multitask federated learning performance under the differential privacy mechanism “different layer perturbations” (virtual environment for real-time biometric interaction and observation, ϵ=1). Figure 9. Multitask federated learning performance under the differential privacy mechanism “different layer perturbations” (wearable stress and affect detection, ϵ=5). Figure 10. Multitask federated learning performance under the differential privacy mechanism “different layer perturbations” (virtual environment for real-time biometric interaction and observation, ϵ=5). Figure 11. Multitask federated learning performance under the differential privacy mechanism “different layer perturbations” (wearable stress and affect detection, ϵ=15). Figure 12. Multitask federated learning performance under the differential privacy mechanism “different layer perturbations” (virtual environment for real-time biometric interaction and observation, ϵ=15). Figure 13. The evaluation results of the multitask federated learning approach with 3 differential privacy (DP) distributions (various layer task perturbation, ϵ=15). Figure 14. The evaluation results of the multitask federated learning approach with 3 differential privacy (DP) distributions (full perturbation, ϵ=15).

Intuitively, our results demonstrate that adding noise to upper layers (identity recognition layers) effectively achieves a better privacy-utility trade-off. This advantage comes at the expense of a formal quantification of the relationship between learning features, that is, what we aim to share, and private variables, that is, what we aim to protect, which is rarely available in practice. We also evaluate the impact of noise distribution type on the proposed framework performance. The results from the WESAD dataset demonstrate that adding Laplace noise in our local training model preserves the stress recognition accuracy better than the Gaussian noise types ( and ). However, it also maintains the identity recognition task performance.

Nevertheless, using the Gaussian mechanism (ie, Gaussian 2) increases the privacy level of the local training model because increasing the number of global iterations will also negatively affect its global convergence performance, that is, a larger T would increase the noise level variance, dramatically decreasing the global accuracy model (). In addition, we have found that a larger K contributes to avoiding the vanishing local SGD gradient problem; however, a larger N leads to a scale-down in the variance of injected noise level to model parameters and fools the SGD training inference.

When DP is also used in FL settings, our experiments suggest that it achieves encouraging performance even on lower budget values (ie, increasing privacy requirements). The compromise of physiological data privacy can have significant consequences for a life of individuals. It provides an opening for data attackers and puts them at danger of data breaches, which can result in several threats []. For instance, Shen et al [] provided an analytic study to estimate the leakage level of the global training FL model about user’s information, and they generated a membership attack system on the global model to check if the target user is participated in training FL model or not. The experiences have been validated and tested on health data, called the University of Michigan Intern Health Study. The authors suggested to use DP to protect user-level privacy while maintaining the utility task performance. In similar work [], applied personalized FL in recognizing stress using mobile health data (ie, WESAD) to decrease the information leakage from the global FL model.

Besides, there are also other risks including the possibility of revealing a soft biometric (eg, gender, location, and authenticity) and a hard biometric (ie, identity). So far, most of the research in using physiological modalities in both biometric domains focused on ECG, electroencephalography, EDA, and photoplethysmography [,]. We found that the model obviously achieves a better balance between the stress and identification classification task when multimodal signals are combined and used as an input compared to the single modality. However, as aimed in our work, it is difficult to provide privacy guarantees since there is no study that provides an investigation on which modality can reveal sensitive information about the subject. In this experiment, we provide a comparative study among these used modalities for both tasks, subject identification and emotion recognition. According to and , we found that the ECG and acceleration modalities always led to an increase in the performance of the identification task. This fact might have contributed to their popularity in the biometric field []. This observation hints us that discarding model parameters containing sensitive information learned from modalities associated with identity information might help to effectively achieve a good privacy-utility trade-off to a certain extent, that is, decrease information leakage. Although for these modalities (eg, ACC and ECG for WESAD), the identification accuracy is higher than binary stress recognition accuracy, in general, it is lower than stress recognition performance. We can interpret this fact by the number of labels and the complexity of the task. For WESAD, there are 15 participants and for VERBIO, there are 55 participants, which corresponds to 15 and 55 different labels for biometric identification ML systems. If we think that the signals of some people can be similar under the same experimental conditions, it can cause some confusion for the ML algorithm and can cause some performance decrease. On the other hand, for stress, we used binary labels, and although the stress response can change from individual to individual, it can be stated that it is a relatively simpler task with 2 labels.

Table 3. The effect of modality on the stress and subject identification performance (wearable stress and affect detection).Modalities and taskApproachCentralizedFLFL+DPACCF1AUCACCF1AUCACCF1AUCACCStress91.6086.0992.0189.9084.8190.3584.4679.2385.00

Comments (0)

No login
gif