Optimising supervised machine learning algorithms predicting cigarette cravings and lapses for a smoking cessation just-in-time adaptive intervention (JITAI)

Abstract

This study aimed to optimise the balance between participant burden and performance of algorithms predicting high-risk moments for a smoking cessation just-in-time adaptive intervention (JITAI) by systematically varying ecological momentary assessment (EMA) prompt frequency, predictor count, and training data requirements.

Thirty-seven participants completed 16 EMAs per day for the first 10 days of their smoking cessation attempt. They reported their mood, context, behaviour, cravings, and smoking lapses. Random forest algorithms were used to predict lapses and cravings. The impacts of EMA prompt frequency, predictor count (reduced using recursive feature elimination with cross-validation [RFE-CV]), and inclusion of the test participant’s data in the training set on algorithm performance, as indicated by the F1-score (balance of precision and recall; median [interquartile range]) were examined using ordered logistic mixed models.

Performance across out-of-sample individuals ranged from excellent to poor. However, several general trends emerged. Reducing EMA frequency modestly but significantly decreased model performance (e.g., 16 EMAs: 0.843 [0.480-0.946], 5 EMAs: 0.791 [0.390-0.912], 3 EMAs: 0.729 [0.286-0.897], p <.001 -.037). Predictor reduction had a negligible impact on algorithm performance and led to the selection of similar predictors across algorithms (all predictors: 0.773 [0.384-0.918], with RFE-CV: 0.773 [0.286-0.922]; p =.313). Algorithms predicting lapses outperformed those predicting cravings (lapses: 0.821 [0.323-0.943], cravings: 0.736 [0.340-0.904]; p =.011). Algorithms without any of the participants’ own data in the training set outperformed algorithms that included varying proportions of the participant’s own data (none: 0.795 [0.521-0.923], 10%: 0.765 [0.300-0.914], 20%: 0.736 [0.235-0.914], 30%: 0.763 [0.203-0.923]; p <.001-.012).

Although reducing EMA prompt frequency modestly decreases algorithm performance in predicting high-risk moments, this may be an acceptable trade-off for implementing a smoking cessation JITAI. Algorithm performance is unaffected by omitting user-specific training data and using fewer predictors. However, variable performance across individuals may limit scalability.

Author summary This study investigated how to make digital support tools that help people quit smoking both effective and user-friendly. The goal is to develop a tool that offers help exactly when people are most at risk of smoking again. In this study, 37 people trying to quit smoking answered short surveys 16 times a day for 10 days about their behaviour, cravings, mood, and surroundings. This data was used to train computer models to predict when someone was likely to crave a cigarette or have a lapse. The study tested how fewer daily surveys, fewer questions, or different prediction methods would affect how well the prediction models work. We found that reducing the number of daily surveys reduced accuracy a bit, but asking fewer questions had little effect. This suggests simpler tools could still work well. Surprisingly, models that were not trained on a person’s own data performed as well as or better than personalised ones, making them more practical for real-world use. However, we also saw that the models worked much better for some individuals than others, which suggests these tools might not be equally effective for everyone.

Competing Interest Statement

OP and JB act as unpaid scientific advisors to the Smoke Free app. CG, DK, CL, TO, and DS have no competing interests to declare.

Clinical Protocols

https://osf.io/tnu72/

Funding Statement

Yes

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethical approval was obtained from the UCL Research Ethics Committee (Project ID: 15297.004). Participants were asked for their informed consent to share their anonymised research data with other researchers via an open science platform.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Comments (0)

No login
gif