Multi-Task Learning and Soft-Label Supervision for Psychosocial Burden Profiling in Cancer Peer-Support Text

Abstract

Objective: Online cancer peer-support text contains signals of psychosocial burden beyond emotional tone, including treatment burden, financial strain, uncertainty, and unmet support needs. We evaluated 2 modeling extensions: multi-task learning (MTL) for joint prediction of health economics and outcomes research (HEOR) burden dimensions, and soft-label supervision using large language model (LLM)-derived probability distributions. Materials and Methods: We analyzed 10,392 cancer peer-support posts. GPT-4o-mini generated proxy annotations for HEOR burden subscales, composite burden, high-need status, speaker role, cancer type, and emotion probabilities. Study 1 trained a shared ALBERT encoder under 4 MTL conditions: composite and subscale burden targets, each with and without auxiliary heads, using Kendall uncertainty weighting. Study 2 compared soft-label training on LLM emotion distributions with hard-label baselines under regular and token-augmented inputs, evaluating performance against both human labels and AI distributions. Results: Composite-only MTL achieved R2=0.446 for burden regression and weighted F1=0.810 for high-need screening; subscale classification achieved mean weighted F1=0.646. Adding auxiliary role and cancer-type heads reduced regression performance (∆R2 = −0.209). Soft-label training reduced weighted F1 by 0.16 versus hard-label baselines (0.68 vs. 0.86), and token augmentation did not improve performance under soft supervision. Discussion: Composite-only MTL supported modeling of multidimensional burden-related signals from forum text, whereas auxiliary prediction heads appeared to compete with primary tasks. Soft-label training aligned poorly with human-labeled emotion categories, suggesting that uncalibrated LLM distributions may propagate bias rather than improve supervision. Conclusion: Composite-only MTL was the strongest burden-modeling approach, and hard-label supervision remained preferable for emotion classification.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used (or will use) ONLY openly available human data that were originally located at: the Mendeley Data repository (doi:10.17632/69dcnv2gzd.1) under a CC BY 4.0 license and is mirrored on Kaggle and GitHub.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The dataset analyzed in this study is publicly available. The Mental Health Insights: Vulnerable Cancer Survivors and Caregivers dataset is available in the Mendeley Data repository (doi:10.17632/69dcnv2gzd.1) under a CC BY 4.0 license and is mirrored on Kaggle and GitHub. LLM-derived annotations, train/validation/test split assignments, and analysis code will be made publicly available upon publication.

https://data.mendeley.com/datasets/69dcnv2gzd/1

Comments (0)

No login
gif