Early Attrition Prediction for Web-Based Interpretation Bias Modification to Reduce Anxious Thinking: A Machine Learning Study

IntroductionBackground

Approximately half of the US population experience a mental illness during their lifetime [,]. During the early stage of the COVID-19 pandemic, researchers estimated an increase of 25.6% in new cases of anxiety disorders per 100,000 people globally []. Mental illness is associated with impaired daily functioning, more frequent use of health care resources, and increased risk of suicide []. However, more than two-thirds of individuals with a mental illness do not receive treatment []. A multitude of barriers impede the initiation and sustained use of face-to-face (ie, traditionally delivered) treatment, including stigma; cost; lack of insurance coverage; and limited availability of support services, especially trained clinicians [,-]. Given these challenges, there is an urgent need to help people manage their mental health in new ways [,].

Digital mental health interventions (DMHIs), which harness digital technologies to promote behavior change and maintain health [], provide an appealing alternative for much-needed treatment outside a clinician’s office []. DMHIs may help individuals overcome obstacles to treatment, such as geographic or financial constraints, and may thus reduce the treatment gap among the broader population. Given the limited resources for health care service delivery, low-cost mobile health and eHealth interventions could be key to supporting symptom monitoring and long-term self-management of patients with mental disorders []. With an increasing demand for mental health care amid a shortage of mental health professionals, the use of eHealth and mobile health apps is expanding [-]. While these solutions have the potential to play an important role in increasing access to mental health services, especially for underserved communities, the clinical community is still determining how to best leverage these solutions [].

Poor adherence and substantial dropout are common challenges in DMHIs []. Adherence, the extent to which users complete a DMHI’s tasks as intended [,], is likely to be associated with better treatment outcomes []. Although these tasks can vary widely (given the varied designs of DMHIs []), it is through engaging with such tasks that DMHIs are thought to achieve their outcomes []. However, sustained engagement with these platforms remains a significant issue [,-]. Digital health interventions suffer from rates of dropout ranging from 30% to as high as 90% [,,,,,]. Dropout occurs when a participant prematurely discontinues an intervention (due to various potential reasons, such as technical issues, lack of time or energy, and lack of perceived benefit []). Even a modest dropout rate can limit the generalizability of digital intervention findings to only those who completed the study; thus, effective evaluation of treatments becomes a challenge [,,,-]. This likely contributes to the uncertainties among clinicians and patients regarding the efficacy, usability, and quality of DMHIs []. There are many reasons clinicians tend not to integrate DMHIs into their clinical practice (eg, insufficient knowledge about DMHIs and lack of training about how to integrate them [,]). An additional reason is that if patients’ sustained engagement with DMHIs is low and they stop participating in the intervention before achieving meaningful gains, then clinicians have little incentive to view DMHIs as a helpful tool to increase the efficiency and impact of care.

One approach to reducing attrition in DMHIs is to identify participants at a high risk of dropping out at the early stages of the intervention, which would permit the intervention to be adapted to these users’ needs []. For example, more support (eg, minimal human contact with a telecoach) could be offered specifically to such users (thereby maintaining scalability []). Although increasing attention has recently been dedicated to attrition in various eHealth interventions [-], relatively few advances within DMHIs have predicted dropout through streamlined quantitative approaches considering both passive and self-reported data. Testing the effectiveness of interventions on treatment outcomes [] often takes priority rather than identifying and predicting users at high risk of attrition. Consequently, methodological advancements in attrition prediction have largely taken place outside clinically relevant settings, such as in the eCommerce and social gaming industries [-]. This paper develops a data-driven algorithm that includes both passive indicators of user behavior and self-reported measures to identify individuals at a high risk of early attrition in 3 DMHIs; as such, it provides a framework that helps in the personalization of DMHIs to suit individual users based on each individual’s attrition risk.

To predict attrition in DMHIs, there are 2 main considerations []. First, we need to define the prediction horizon; that is, researchers should determine the point in an intervention’s timeline at which it would be beneficial to predict which participants are at a high risk of dropping out. This decision may be influenced by an analysis of when in the timeline most participants are actually dropping out; such an analysis may allow the identification and strengthening of weak parts of an intervention. Given that low engagement has been consistently cited as the construct underlying attrition, this decision may also be informed by considering typical patterns of engagement [,,,-]. However, engagement is a very broad construct with many components [], and empirical evidence suggests that engagement fluctuates with time []. Thus, carefully defining the feature space and predicting participants who are at a high risk of attrition at meaningful time points in a program can provide valuable information. For example, participants may initially stay in the intervention out of curiosity, which relates to the novelty effect—the human tendency to engage with a novel phenomenon [], but then lose interest. If a researcher wants to mitigate the impact of the novelty effect, then understanding early-stage dropout (ie, early in the program but after it is no longer brand new and unknown) is critical.

Second, we must consider which factors cause users to drop out of a given DMHI. Answering this question can help researchers and designers tailor the intervention to particular user groups. Demographic variables such as gender, age, income, and educational background have been related to higher attrition rates in digital health interventions [,-]. With respect to participants’ mental health (eg, lifetime symptoms assessed at baseline or current symptoms assessed during the course of the intervention), the presence of mental health symptoms may increase interest toward the use of a digital intervention in an effort to reduce such symptoms []. However, certain symptoms (eg, hopelessness) may reduce the participants’ motivation or ability to sustain engagement with an intervention [,,,,,]. In addition to these baseline user characteristics, user clinical functioning (ie, current symptoms and psychological processes that lead to the maintenance of these symptoms), self-reported user context and reactions to interventions (eg, perceived credibility of DMHIs, which is associated with increased engagement and reduced dropout []), and passively detected user behavior influence attrition rates in digital platforms [,]. This behavior includes time spent using an intervention [,,], the passively detected context (eg, time of the day and day of the week) [], and type of technology (eg, web, smartphone, computer based, or wearable) [,].

Prior studies, mainly in psychology, have predicted attrition primarily with statistical techniques such as ANOVA and regression [,,-]. In addition, other research has used macrolevel approaches, such as contrasting one intervention’s attrition rate against another’s [] and examining participant and psychotherapy trial factors that predict dropout rates []. Researchers in computer and data science and the mobile gaming industry more commonly leverage passively collected behavioral data from users and have found success in predicting attrition (“churn”) using more advanced techniques, such as linear mixed modeling [], survival analysis [], and probabilistic latent variable modeling []. More recently, advanced machine learning models, such as deep neural networks, have also been useful for modeling and predicting attrition in mobile gaming [,,,] and in digital health care applications [,]. Our approach builds on work predicting attrition in DMHIs [,,,,-] and incorporates both passively collected behavioral data and self-reported data [,,,-].

An attractive DMHI for anxiety is cognitive bias modification for interpretation (CBM-I [,]), a web-based program with potential to reach large, geographically diverse samples of adults with anxiety symptoms. CBM-I aims to shift threat-focused interpretation biases in which people with anxiety symptoms tend to assign a negative or catastrophic meaning to situations that are ambiguous. Cognitive models of anxiety suggest that training people with anxiety symptoms to consider benign interpretations of ambiguous situations, as opposed to only rigidly negative interpretations, may reduce anxiety [-]. To shift interpretation biases, CBM-I training sessions prompt users to imagine themselves in ambiguous, threat-relevant scenarios (presented in a set of short sentences) and to practice disambiguating each scenario by filling in its final word (typically presented as a word fragment) []. Active CBM-I conditions encourage more positive and flexible interpretation of scenarios by providing a final word that assigns a benign or a positive meaning to the ambiguous situation (consider this example: “As you are walking down a crowded street, you see your neighbor on the other side. You call out, but she does not answer you. Standing there in the street, you think that this must be because she was distracted.”). By presenting benign or positive endings for most scenarios (eg, 90%), positive CBM-I conditions train a positive contingency in which users learn to expect that ambiguous potentially threatening situations usually work out fine.

The greatest degree of improvement is expected in positive conditions relative to other active conditions (eg, 50% positive and 50% negative conditions that present positive and negative endings in equal proportions, thereby training flexible interpretation but no contingency) and to control conditions (eg, no training or a neutral condition with emotionally unambiguous scenarios and neutral endings). Thus, this paper focuses on attrition in positive conditions. Despite some mixed results [,], a number of studies have shown the effectiveness of positive CBM-I conditions in shifting interpretation biases and reducing anxiety symptoms [,,,-]. To benefit from CBM-I programs, people must be able to use them effectively during a sustained period. However, similar to many DMHIs, web-based CBM-I programs face substantial attrition rates [,].

Objective

This paper has 3 aims. The first aim is to determine a practical attrition prediction horizon (ie, to determine the session at which it would be beneficial to identify individuals at a high risk of dropping out). The second aim is to identify participants at a high risk of dropping out by leveraging baseline user characteristics, self-reported user context and reactions to the program, passively detected user behavior, and clinical functioning of users within our analysis. The third aim is to explore which of these feature sets are most important for the identification of participants at high risk. To achieve these aims, we propose a multistage pipeline to identify participants who are at a high risk of dropout from the early stages of 3 different DMHI studies. These interventions use web-based CBM-I [,] to help individuals change their thinking in response to situations that make them feel anxious or upset [,,]. Note that our proposed pipeline is expected to apply broadly to DMHIs; however, in this paper, we focus on CBM-I programs as a useful starting point and look for important features of attrition in such programs.

MethodsData Source and Interventions

MindTrails [] is a multisession, internet-delivered CBM-I training program. To date, >6000 people across >80 countries have enrolled in MindTrails, pointing to participant interest in accessing a technology-delivered, highly scalable intervention that can shift anxious thinking in a targeted and efficient way.

In this paper, we focus on 3 MindTrails studies: Managing Anxiety, Future Thinking, and Calm Thinking. We provide a brief overview of these studies, which were approved by the University of Virginia Institutional Review Board (IRB). We analyzed data from 1277 participants who took part in these studies. Details of the studies are provided in .

Table 1. Overview of MindTrails studies.Study nameDurationTarget populationNumber of CBM-Ia training sessionsValid participants in parent study, nPositive CBM-I participantsb, nEngagement strategy

CompensationSession reminderManaging AnxietyJun 8, 2016, to January 20, 2019Adults with anxiety8807252NoneEmailsFuture ThinkingMay 3, 2017, to October 16, 2019Adults with negative expectations about the future41221326NoneEmails, text messagesCalm ThinkingMay 18, 2019, to November 13, 2020Adults with anxiety51748699US $25cEmails, text messages

aCBM-I: cognitive bias modification for interpretation.

bCondition of interest for this paper’s analyses.

cUS $5 per assessment at baseline, after Session 3, and after Session 5; US $10 for follow-up assessment.

Participants and ProcedureStudy 1: Managing Anxiety

The Managing Anxiety study focused on the development of an infrastructure to assess the feasibility, target engagement, and outcomes of a free, multisession, web-based CBM-I program for anxiety symptoms. A large sample of community adults with at least moderate trait anxiety based on an anxiety screener (Anxiety Scale of the 21-item Depression Anxiety Stress Scales, DASS-21 []) was randomly assigned to (1) positive CBM-I training (90% positive and 10% negative), (2) 50% positive and 50% negative CBM-I training, or (3) a no-training control condition. Toward the start of CBM-I training, participants also underwent an imagery prime manipulation, an imagination exercise designed to activate the participants’ anxious thinking about a situation in their life. After consenting and enrolling, the participants completed a battery of baseline measures, including demographic information, mental health history, and treatment history. For details about the Managing Anxiety study protocol, including the aims and the outcome measures of the study, refer to the main outcomes paper by Ji et al [].

The program involved up to 8 web-based training sessions, delivered at least 48 hours apart, with assessments immediately after each session and a follow-up assessment 2 months after the last session. During each session, CBM-I training was provided. This training involved 40 training scenarios, which were designed to take approximately 15 minutes to complete. Study contact, in the form of automated reminder emails sent to all participants, was equivalent in content and schedule regardless of training condition. If participants completed only part of an assessment task, they continued the assessment the next time they returned. If they completed only part of a training task, they restarted the task upon returning. Participants received no monetary compensation. A total of 3960 participants completed the eligibility screener, out of which 807 (20.38%) eligible participants enrolled and completed the baseline assessment. In this paper, only data from the positive intervention arm (ie, positive CBM-I condition) were used (n=252, 31.23% of participants who enrolled and completed the baseline assessment), given our interest in testing predictors of attrition in positive CBM-I across all 3 studies.

Study 2: Future Thinking

The Future Thinking study, a hybrid efficacy-effectiveness trial, focused on testing a multisession, scalable, web-based adaptation of CBM-I to encourage healthier, more positive future thinking in community adults with negative expectations about the future based on the Expectancy Bias Task (shortened from the version used by Namaky et al []). After completing the screener, eligible participants provided consent; were enrolled; and were randomly assigned to (1) positive conditions with ambiguous future scenarios that ended positively, (2) 50-50 conditions that ended positively or negatively, or (3) a control condition with neutral scenarios. For details about the aims and outcome measures of the Future Thinking study, refer to the main outcomes paper by Eberle et al [].

The participants were asked to complete 4 training sessions (40 scenarios each). Assessments were given at baseline, immediately after each session, and during the follow-up assessment 1 month after the last session. Participants had to wait for 2 days before starting the next training session; they had to wait for 30 days before starting the follow-up assessment. Participants had the option of receiving an email or SMS text message reminder when the next session or follow-up assessment was due. If they completed only part of a training or assessment task, they continued the task the next time they returned. The participants received no monetary compensation. A total of 4751 participants completed the eligibility screener, out of which 1221 (25.70%) were eligible and were enrolled. In this paper, only data from the positive CBM-I intervention arm (ie, the positive condition and the positive + negation condition) were used (n=326, 26.70% of enrolled participants).

Study 3: Calm Thinking

The Calm Thinking study, a sequential, multiple assignment, randomized trial, tested the effectiveness of positive CBM-I relative to a psychoeducation comparison condition (randomly assigned at Stage 1). It also tested the addition of minimal human contact (ie, supplemental telecoaching randomly assigned at Stage 2 []) for CBM-I participants classified as having a higher risk of dropout early in the study. Additional details can be found in the main outcomes paper by Eberle et al [].

After completing the anxiety screener (DASS-21-Anxiety Scale), eligible participants provided consent and were enrolled. The participants were asked to complete a baseline assessment and 1 training session per week for 5 weeks (5 sessions total, 40 scenarios each in CBM-I), with an assessment immediately after each session and a follow-up assessment 2 months after the last session. If the participants completed only part of a training or assessment task, they continued the task the next time they returned. They were compensated via e-gift cards (refer to for details). A total of 5267 participants completed the eligibility screener, out of which 1748 (33.19%) were eligible and were enrolled. To allow a clean analysis of attrition during positive CBM-I, data [] from the CBM-I–only intervention arm (n=699, 39.99% of enrolled participants; ie, CBM-I condition excluding participants at high risk who were randomized to receive supplemental coaching) were used in this paper.

In total, 252 Managing Anxiety participants, 326 Future Thinking participants, and 699 Calm Thinking participants were in the positive CBM-I intervention arm of these studies.

Definition of Attrition

In this paper, we predict attrition in multisession DMHIs. A paper by Eysenbach [] defined two types of attrition: (1) nonuse attrition, which refers to participants who stopped using the intervention (ie, who did not complete the training sessions), and (2) dropout attrition, which refers to participants who were lost to follow-up because they stopped completing research assessments (eg, who did not complete follow-up assessment). In MindTrails studies, training and assessment tasks are intermixed and must be completed in series. For example, the participants cannot complete Session 1 assessment until they complete Session 1 training, they cannot complete Session 2 training until they complete Session 1 assessment, and so on. Due to this sequential design, nonuse and dropout attrition are conflated in our studies. As it is impossible to skip any training or assessment tasks, we simply use the term attrition in this paper.

Ethical Considerations

All 3 studies were reviewed and approved by the IRB of the University of Virginia (Managing Anxiety: IRB #2703; Future Thinking: IRB #2690; and Calm Thinking: IRB #2220). After screening, the eligible participants provided informed consent for “a new internet-based program.” Data were stored in accordance with University of Virginia Information Security policies, and deidentified data were analyzed. In the Calm Thinking study, the participants were compensated with e-gift cards worth up to US $25: US $5 for each assessment at pretreatment and after Sessions 3 and 5, and US $10 for the follow-up assessment. Compensation is detailed by study in .

Attrition Prediction Pipeline

DMHIs are often divided into multiple phases, sometimes called modules. In this paper, we refer to modules as sessions to mirror the language used by mental health specialists for in-person treatment (eg, holding sessions with a client). We proposed a pipeline that is built to handle multisession DMHI datasets with a diverse set of features. As our focus is on multisession studies, we assumed that the study contained ≥1 assessment or training session to achieve the study goals. Therefore, we required at least 1 observation from each participant for the selected features.

Predicting early-stage dropout in DMHIs is challenging and requires several key tasks. We first determined the prediction horizon of the selected CBM-I interventions (Aim 1). We then organized the remaining tasks into four main steps from the data science and engineering literature: (1) data preprocessing, (2) feature generation, (3) predictive modeling, and (4) feature importance. We outline these steps in the context of attrition prediction in DMHI in and describe each step below.

‎

Figure 1. Overview of the pipeline predicting early-stage attrition in web-based, multisession cognitive bias modification for interpretation (CBM-I) interventions. Prediction Horizon

To analyze when users stopped using the intervention (Aim 1), the proportions of participants who completed each training session (out of the number of participants who started Session 1 training) were visualized (). In this figure, each session was considered complete if participants completed the last questionnaire (ie, had an entry in the Task Log for the questionnaire) in the assessment that immediately followed a given training session. For the following reasons, we decided to focus on participants who had started Session 1 training and to predict which of these participants were at a high risk of dropping out before starting Session 2 training (Aim 2). First, our goal is to make inferences about user dropout during DMHIs (and not to simply use baseline assessments to predict which users will fail to even start the program). We restricted the sample to participants who had started Session 1 training because we consider these participants as part of the intent-to-treat sample. Second, the highest rate of attrition was observed between the start of the first training session and the end of the second session’s assessment, with most dropout occurring between the sessions (vs during Session 1 or Session 2). Therefore, we wanted to predict participants at a high risk of dropping out before starting Session 2 training. Notably, the identification of participants who are at a high risk of dropout early in the program might decrease the attrition rate at the end of the intervention. This is because detecting participants at high risk sooner rather than later permits targeted supports to be added to increase retention at pivotal times.

‎

Figure 2. Proportion of completion per training session (out of participants who started Session 1 training) by study. The session was deemed completed if participants completed the last questionnaire in the assessment that immediately followed the training session. Dashed lines show the last training session for each study. Data PreprocessingOverview

All data must be preprocessed before analysis, especially data collected outside a controlled laboratory environment. In the following paragraphs, we describe our methods for addressing issues such as invalid participant data, outliers, and missingness during preprocessing.

Invalid Participants

One of the main challenges in web-based digital mental health studies is to distinguish spam and bot-generated responses from real responses [,]. Malicious actors often use bots to complete questionnaires when they learn of an appealing incentive, such as monetary compensation for participating in a study. To increase the validity of the input data, we removed suspicious responses such as those that were submitted quickly (eg, <5 s for half of all questions in a given measure) or contained submissions that violated the required wait time (eg, 48 h) between sessions.

Outliers

To reduce the likelihood of identifying coincidental events, we first normalized the data using the z score metric. We then identified and removed outliers; as we did not expect to have very large or small data values [], we excluded outliers at least 3 SDs from the mean value [] for numerical variables. For categorical variables, we excluded outliers based on visual inspection of a frequency distribution (a histogram with the Freedman-Diaconis rule to determine the bin width).

Missing Values

Real-world data collection is often messy; technical issues, dropout, and loss of network connection are all common issues that arise and can lead to missing values for some or all items of a given questionnaire. In addition, participants in DMHIs are often given the option to decline to answer items when responding to a self-reported questionnaire. This may be done either implicitly (in which the question is not required) or explicitly (in which the participant is given a set of options, where one of the options is “prefer not to answer” or a similar response). The challenge associated with empty or “prefer not to answer” values is that they both function as missing values.

Missing values are a fundamental issue in digital health interventions for several reasons []. Most machine learning techniques are not well prepared to deal with missing data and require that the data be modified through imputation or deletion of the missing records. In addition, missing data may significantly impact the predictive analysis as well as descriptive and inferential statistics []. To address these issues, we used several imputation approaches to handle the challenge of missing data in some or all items in the required features and time points for different types of variables. Without imputation, these missing data could lead to more bias, decreased statistical power, and lack of generalizability.

We handled missing data for all features, for each unique time point, using the following methods: out of the initial set of features (221 for Managing Anxiety, 109 for Future Thinking, and 241 for Calm Thinking), we first removed features or variables at a given time point that have missing values in >80% of all valid participants. The percentages of features removed for this reason in Managing Anxiety, Future Thinking, and Calm Thinking studies were 33.94% (75/221), 20.18% (22/109), and 47.30% (114/241), respectively, yielding a final set of 146, 87, and 127 features, respectively. Next, we imputed categorical variables at a given time point with the most frequent answers at that time point of participants with the same demographics. To do so, we grouped participants based on 2 of the demographic characteristics (ie, education and gender, which were the most complete). To impute the numerical individual item variables at a given time point, we used the k-nearest neighbors method [] to replace the missing values in the same demographic group with the mean value at that time point from the 5 nearest neighbors found in the training set. We used a Euclidean distance metric [] to impute the missing values.

Unexpected Multiple Observations

Unexpected multiple observations may be present within a DMHI dataset for several reasons. Participants might complete the eligibility screener multiple times to gain access to the intervention if they were previously screened or to achieve a more desirable score. Technical issues can also cause duplicate values. For example, a brief server error may cause a questionnaire to be submitted more than once. We used one of the following two strategies to handle unexpected multiple observations: (1) calculate the average values of each item across the observations or (2) keep the latest observation. We selected one of the abovementioned strategies based on the temporal latency between unexpected multiple observations. If the temporal latency between unexpected multiple observations was less than the mean latency across all participants, we applied the first strategy. Otherwise, the second strategy was selected.

Feature GenerationBaseline User Characteristics

(Note: Measures without citations in this section and the sections below were developed by the MindTrails research team.) At the baseline assessment of the 3 studies, the following demographic variables were assessed: age, gender, race, ethnicity, education, employment status, marital status, income, and country. History of mental health disorders and treatment were also assessed. In the Managing Anxiety and Calm Thinking studies, participants were also asked about the situations that make them anxious; these situations are called anxiety triggers. We included these measures in our baseline user characteristics feature set ().

Table 2. Selected features by set extracted from cognitive bias modification for interpretation studies.Set and task (from Task Loga)DescriptionStudySessionBaseline user characteristics
DemographicsAssesses age, gender, race, ethnicity, education, employment status, marital status, income, and countryMAb, FTc, and CTdBaseline
Mental health historyAssesses mental health disorders and treatmentsMA, FT, and CTBaseline
Anxiety triggersAssesses situations that prompt anxietyMA and CTBaselineSelf-reported context and reactions to program
CredibilityAssesses importance of reducing anxiety or changing thinking (Importance Ruler) and confidence in intervention []MA, FT, and CTBaseline
Return intentionAssesses days until returningMA, FT, and CTSession 1
AffectAssesses state anxiety (Subjective Units of Distress; in MA and CT) or current positive and negative feelings (in FT)MA, FT, and CTSession 1
Impact of anxious imagery primeAssesses peak anxiety during imagery primeMA and CTSession 1
Session reviewAssesses location, level of distraction, and ease of use of programCTSession 1Passive detection of user behavior
All assessment and training tasksComputed time on a page, time of the day, and day of the weekMA, FT, and CTBaseline and Session 1
All assessment and training tasksComputed cumulative time elapsed to complete all components of a given task and latency between completing one task and starting the nextCTBaseline and Session 1
Training task (for FT) and all assessment and training tasks (for CT)Device (from Training table for FT, from Task Log for CT)FT and CTBaseline and Session 1User clinical functioning
Interpretation bias (Recognition Ratings)Assesses positive and negative interpretations of ambiguous situations (each valence scored separately, including both threat-related and threat-unrelated itemse)MA and CTBaseline
Interpretation bias (Brief Body Sensations Interpretation Questionnaire)Assesses positive and negative interpretations of ambiguous situations (each valence scored separately, including items for both internal and external events and excluding neutral items)MA and CTBaseline
Expectancy biasAssesses positive and negative expectations for ambiguous future situations (Expectancy Bias Task; each valence scored separately)FTBaseline and Session 1
Anxiety (OASISf)Assesses anxiety symptoms (OASIS)MA and CTBaseline and Session 1
Anxiety (DASS21-ASg)Assesses anxiety symptomsMA and CTBaseline
Anxiety and depression (PHQ-4h)Assesses anxiety (Generalized Anxiety Disorder-2 scale) and depression (2-item PHQ) symptoms (each measure scored separately)FTBaseline
Depression (DASS21-DSi)Assesses depression symptomsMABaseline
Daily drinkingAssesses alcohol use (Daily Drinking Questionnaire)MABaseline
Anxiety identityAssesses centrality of anxiety to identity (Anxiety and Identity Circles)CTBaseline
MechanismsAssesses cognitive flexibility (Cognitive Flexibility Inventory), experiential avoidance (Comprehensive Assessment of Acceptance and Commitment Therapy Processes), cognitive reappraisal (Emotion Regulation Questionnaire), and intolerance of uncertainty (Intolerance of Uncertainty Scale-12; each measure scored separately)CTBaseline
Wellness (What I Believe)Assesses self-efficacy (NGSESj), growth mindset (PBSk), and optimism (LOT-Rl; each measure scored separately)FTBaseline
WellnessAssesses self-efficacy (NGSES), growth mindset (PBS), optimism (LOT-R), and life satisfaction ([]; each measure scored separately)CTBaseline
Wellness (QOLm)Assesses life satisfactionMABaseline

aTask Log is a log table that tracks the completion of each assessment and training task for each participant in a given study; when the task’s content is not evident in the task’s name, the content is listed and the name is in parentheses.

bMA: Managing Anxiety.

cFT: Future Thinking.

dCT: Calm Thinking.

ePositive and negative interpretation bias assessed using Recognition Ratings are typically scored using only the threat-related items, but given that these are only 2 features, we do not expect this to markedly impact the algorithm.

fOASIS: Overall Anxiety Severity and Impairment Scale.

gDASS-21-AS: 21-item Depression Anxiety Stress Scales-Anxiety Scale.

hPHQ-4: 4-item Patient Health Questionnaire.

iDASS-21-DS: 21-item Depression Anxiety Stress Scales-Depression Scale.

jNGSES: New General Self-Efficacy Scale.

kPBS: Personal Beliefs Survey.

lLOT-R: Life Orientation Test-Revised.

mQOL: Quality of Life Scale.

Self-Reported User Context and Reactions to Program

The importance of reducing anxiety or changing thinking (Importance Ruler, modified from Case Western Reserve University []) and confidence in the intervention (modified from Borkovec and Nau []) were assessed at the baseline assessment of every study. In addition, after completing a given session’s assessment, participants were asked for the date they would return for the next session. State anxiety (in Managing Anxiety and Calm Thinking; Subjective Units of Distress, SUDS, modified from Wolpe []) or current positive and negative feelings (in Future Thinking) were assessed before and after participants completed each session’s training. The Managing Anxiety and Calm Thinking studies also assessed participants’ peak anxiety when imagining an anxiety-provoking situation in their lives as part of the anxious imagery prime completed toward the start of training. At the end of each session in the Calm Thinking study, the participant’s location, level of distraction, and ease of use of the program were assessed. All of these measures were included in the self-reported user context and reactions to the program feature set (see details in ).

Passive Detection of User Behavior

To further understand participants’ context and behavior when interacting with the platform, the following variables were calculated: time spent on a page, time of day, day of the week, and latency of completing assessments. The type of device (ie, desktop, tablet, smartphone) was also included as a feature given that multiple devices could be used to access the program, each with different characteristics (eg, screen size, input methods, and mobility) that could influence user behavior. In most cases, these variables were extracted for each assessment and training task for each session. For details about which features were extracted for which studies, see .

User Clinical Functioning

Primary and secondary outcome measures used to evaluate the effectiveness of the intervention were included in the clinical functioning feature set. These measures assessed interpretation bias (Recognition Ratings, RR, modified from Matthews and Mackintosh []; and Brief Body Sensations Interpretation Questionnaire, BBSIQ, modified from Clark et al []), expectancy bias (Expectancy Bias Task, modified from Namaky et al []), anxiety symptoms (Overall Anxiety Severity and Impairment Scale, OASIS, adapted from Norman et al []; DASS-21-Anxiety Scale; and Generalized Anxiety Disorder-2 scale, GAD-2, modified from Kroenke et al []), comorbid depression symptoms (DASS-21-Depression Scale; and Patient Health Questionnaire-2, PHQ-2, modified from Kroenke et al []), and alcohol use (Daily Drinking Questionnaire, DDQ []). They also assessed the centrality of anxiety to identity (Anxiety and Identity Circles, modified from Ersner-Hershfield et al []) and other cognitive mechanisms, including cognitive flexibility (Cognitive Flexibility Inventory, CFI, adapted from Dennis and Vander Wal []), experiential avoidance (Comprehensive Assessment of Acceptance and Commitment Therapy Processes, CompACT, modified from Francis et al []), cognitive reappraisal (Emotion Regulation Questionnaire, ERQ, modified from Gross and John []), and intolerance of uncertainty (Intolerance of Uncertainty Scale-Short Form, IUS-12, modified from Carleton et al []). Finally, they assessed self-efficacy (New General Self-Efficacy Scale, NGSES, modified from Chen et al []), growth mindset (Personal Beliefs Survey, PBS, modified from Dweck []), optimism (Life Orientation Test-Revised, LOT-R, modified from Scheier et al []), and life satisfaction ([]; Quality of Life Scale, QOL []). For details about which features were extracted for which studies, see .

Predictive ModelingOverview

For each study, predictors of attrition were investigated after participants started Session 1 training, imputing any missing values for features collected during Session 1 training or assessment.

To identify participants at high risk of dropping out before starting the second training session, the following predictors of attrition were investigated: baseline user characteristics (at the pretest assessment), self-reported user context and reactions to the program, passively detected user behavior, and clinical functioning of users. We used data from the pretest, the first training session, and the assessment following the first training session.

Dropout Label

For each participant, we calculated a binary ground truth label for their actual dropout status before starting the second training session, where 0 indicates the participant started training for the second session and 1 indicates the participant did not start training for the second session (ie, dropped out). A participant was deemed as having started a given session’s training if they had an entry in the Task Log for the Affect task, which was administered immediately before the first page of training materials for each session.

Class Imbalance

Class imbalance is a common problem for supervised learning tasks such as attrition prediction. Such datasets have 1 or more classes (eg, “did not dropout” in the case of Calm Thinking) that have a greater number of observations than other classes (eg, “dropped out” in Calm Thinking). Class imbalance can worsen the performance of machine learning models by biasing them toward learning the more commonly occurring classes. We used the synthetic minority oversampling technique [] to help rectify the class imbalance.

The synthetic minority oversampling technique resolves this challenge by generating synthetic samples for the minority class, with the aim of balancing the distribution of samples between the 2 classes. The technique operates by selecting 2 or more samples from the minority class and computing the difference between their features. This difference is then added to the feature values of one of the selected samples to create a new synthetic sample. This process is repeated to generate a sufficient number of synthetic samples, which are then added to the original dataset to achieve an optimal balance between the majority and minority classes. It has proven to be very effective in dealing with class imbalance problems for tabular datasets [] ().

Classification

Binary classification is a well-studied problem in the machine learning literature [,], and a plethora of models and approaches exist for predicting attrition. We selected leading machine learning models, beginning with simpler, more interpretable models and progressing to more expressive models for identifying the best predictors of early-stage dropout. We trained and validated a range of models, described in detail below and listed in . Models that learn a linear decision boundary are typically the first approach for binary classification problems. These models separate participants into 2 classes defined by the estimated decision boundary, in our case participants who drop out and those who remain. The logistic regression model estimates this decision boundary by minimizing the mean squared error of predictions in the training set []. Similarly, the support vector machine (SVM) estimates this boundary by maximizing the distance from the edge of each class. Some nonlinearity is also introduced into the SVM by projecting its feature space with the radial basis function (RBF) kernel [].

Other models estimate a nonlinear decision boundary. A decision tree model estimates a continuous piecewise boundary, with each piece indicating a different set of conditions that leads to a particular leaf node of the tree []. We further evaluated several tree-based ensemble models. In ensemble models, multiple submodels are composed to form a prediction. The random forest model uses decision trees as its submodel, creating a “forest” (set) of such trees. The random forest estimates the best feature subset to give to each tree while maximizing the average prediction accuracy over all trees []. Similarly, AdaBoost comprises multiple shallow decision trees, giving a weighting to each tree according to the overall prediction accuracy [].

Finally, gradient boosting algorithms (and the related extreme gradient boosting [XGBoost] method []) were used to train ensembles of decision trees. Gradient boosting minimizes an objective function that is differentiable with respect to all submodel parameters, and the submodel parameters are adjusted via gradient descent. XGBoost [] is based on the same concept, but also includes parameter regularization to prevent overfitting and second-order derivatives to control gradient descent. The regularized greedy forest (RGF) model was also evaluated. RGF not only includes tree-structured regularization learning, but also employs a fully corrective regularized greedy algorithm []. Finally, a multilayer perceptron model was used. This neural network model implements a feed-forward architecture that backpropagates error with stochastic gradient descent [].

We employed 10-fold cross-validation stratified by dropout label (ie, dropout vs not dropout) across 100 iterations. Hyperparameter tuning was performed using group 5-fold cross-validation on the training set. Hyperopt [] was used to optimize hyperparameters including the number of estimators, learning rate, maximum tree depths, C parameter, and γ. We evaluated up to 30 combinations of these parameters to maximize the model’s average macro–F1-score across 5 folds. The set of hyperparameters that achieved the highest average macro–F1-score across the 5 folds was chosen to train the model on the entire training set during the outer split.

Table 3. Performance of attrition prediction models within a given study based on macro–F1-scores, area under curve, and accuracy scores. The models were trained on the Managing Anxiety (MA) [], Future Thinking (FT) [], and Calm Thinking (CT) [,] studies and were tested on their respective test sets.Data and modelEvaluation metrica
Macro–F1-score↑bArea under the curve↑Accuracy↑Training and test data: Managing Anxiety with 146 features
Logistic regression.698.774.717
Support vector machine.723.802.760
Decision tree.555.610.644
Random forest.819.827.843
Gradient boosting.802.808.808
Extreme gradient boosting.832c.848.858
Regularized greedy forest.794.853.823
Multilayer perceptron.690.772.723Training and test data: Future Thinking with 87 features
Logistic regression.682.752.689
Support vector machine.719.787.728
Decision tree.688.745.693
Random forest.767.840.768
Gradient boosting.758.823.759
Extreme gradient boosting.770.844.771
Regularized greedy forest.728.817.735
Multilayer perceptron.694.778.703Training and test data: Calm Thinking with 127 features
Logistic regression.878.874.878
Support vector machine.869.861.869
Decision tree.786.895.788
Random forest.914.917.910
Gradient boosting.901.908.901
Extreme gradient boosting.917.926.918
Regularized greedy forest.902.908.918
Multilayer perceptron.878.879.878

aEach metric can range from 0 to 1, with macro–F1-score and area under curve values >.5 and accuracy values >.7 generally considered reasonable; refer to the Evaluation Metrics section for details.

b↑ indicates that higher values are more desirable for a given metric.

cThe highest values for each metric are italicized.

Model Optimization

To enhance model performance and efficiency, optimization techniques were applied. For instance, in the SVM model, we selected the RBF kernel with γ determined as 1/(number of features × X.var ()) to control the influence of training examples. In decision tree models, all features were considered for finding the best splits, while feature subsampling was employed to reduce model correlation and variance.

Our selected criterion for the decision model is entropy, which measures the degree of disorder of the features in relation to the target. The optimum split is chosen by the feature with the lowest entropy. It gets its maximum value when the probability of the classes is the same. A node is pure when the entropy has its minimum value, which is zero. For the random forest model, we take all the features that make sense in every tree.

In the XGBoost model, we set the subsample ratio of columns for each level equal to 0.4. Sampling occurs once for every new tree. The γ parameter in XGBoost is used as a threshold for creating new splits in the tree; it represents the minimum loss reduction required to make a further partition on a leaf node of the tree. We set γ=8. To control the balance of positive and negative weights in a binary classification problem, we set the parameter scale_pos_weight = sum(negative instances) / sum(positive instances). This parameter allows adjustment of the relative weight of

View original article

JMIR MENTAL HEALTH

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Early Attrition Prediction for Web-Based Interpretation Bias Modification to Reduce Anxious Thinking: A Machine Learning Study

Comments (0)