Development of a novel musculoskeletal hypothesis using sparse Group Factor Analysis: the ADVANCE cohort

Abstract

Musculoskeletal conditions are a leading global cause of disability, yet the factors influencing long-term musculoskeletal health, particularly following trauma, remain incompletely understood. This study applies sparse Group Factor Analysis, a hierarchical unsupervised machine learning method, to the ADVANCE cohort—a longitudinal dataset of 1445 UK Afghanistan War servicemen—to identify latent structures in multimodal clinical data. Study 1 validated the approach by rediscovering known group-level patterns between combat-injured and non-injured participants, including poorer outcomes in pain, mobility, and bone health among those with lower limb loss. Study 2 explored the Injured, non-amputee subgroup without prespecified labels to identify new hypothesis-generating clusters that could subsequently be tested using standard hypothesis testing methods. A subgroup of 125 individuals with worse musculoskeletal outcomes was uncovered. This group had greater body mass, higher injury severity, and a higher prevalence of head injury. These findings led to a novel hypothesis: that head injury, including potential traumatic brain injury, is associated with long-term musculoskeletal deterioration. This hypothesis is supported by literature in both athletic and military populations and will be tested in follow-up analyses. Our findings demonstrate how sparse Group Factor Analysis, combined with clinical insight, can uncover hidden patterns in large-scale datasets and generate testable, clinically relevant hypotheses that inform prevention, treatment, and rehabilitation strategies.

Author Summary Musculoskeletal conditions such as osteoarthritis and low back pain are the second largest contributor to global disability. They can be caused by a variety of factors such as ageing, genetics, lifestyle, and injury. Understanding the interconnectedness of long-term musculoskeletal outcomes following injury could help improve prevention, intervention and rehabilitation initiatives to reduce resulting disability. In this study, we describe a new machine learning methodology called Sparse Group Factor Analysis that we apply to a complex dataset from a military cohort study to generate new research hypotheses. The first study (n=1145) validated our approach by generating hypotheses that we had already investigated via traditional methods. The second study used a sub-set of the cohort (125 participants with poor musculoskeletal outcomes). This showed a link between poor musculoskeletal outcomes and head injury, resulting in a new hypothesis that a head injury or traumatic brain injury may contribute to poor musculoskeletal outcomes. We will test this hypothesis using traditional methods in follow-up analyses. We have demonstrated how Spare Group Factor Analysis can be used alongside clinical knowledge to find hidden patterns in in large, complex datasets to provide information that could inform improved prevention of future musculoskeletal injury, intervention and rehabilitation strategies.

Competing Interest Statement

NF is the recipient of grants from the UK Ministry of Defence and the Office for Veterans' Affairs, consultation fees and support for attending meetings from Gallipoli Medical Research, a member of the Academic Advisory Board for the Office of Veterans' Affairs, a specialist advisor on the release of patient data for research for NHS England, the Director of the Forces in Mind Trust research centre, the Director of the King's Centre for Military Health Research at King's College London, and a Trustee for Help for Heroes. All other authors declare no conflicts of interest. AAF is funded by a UKRI Turing AI Fellowship.

Clinical Protocols

https://bmjopen.bmj.com/content/10/10/e037850

Funding Statement

Yes

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethical approval was granted by the UK Ministry of Defence Research Ethics Committee (357/PPE/12). Informed written consent was given by all participants.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

Given their unusually sensitive nature, data have not been made widely available. Requests for data will be considered on a case-by-case basis and subject to UK Ministry of Defence clearance via adv_data_teamimperial.ac.uk.

Comments (0)

No login
gif