Machine learning models for tuberculous pleural effusion diagnosis in Africa setting

ABSTRACT

Background Traditional diagnostic methods for Tuberculous pleural effusion (TPE) are often limited by their invasiveness, low sensitivity, and lack of accessibility. This study aims to develop and evaluate ML models for diagnosing TPE in African contexts, using readily available clinical and laboratory data.

Methods A cross-sectional study carried out in Yaoundé, Cameroon (2018-2023), included patients with non-purulent exudative pleural effusion. Pleural fluid was analysed for total protein, lactate dehydrogenase (LDH), glucose, C-reactive protein (CRP) and cytology. TPE diagnosis relied on detection of tuberculous bacilli or tuberculous granuloma . Five ML models namely Random Forest (RF), XGBOOST, Logistic Regression (LR), Support Vector Machine (SVM), and Multilayer Perceptron (MLP) were tested using binary classification (TPE vs non-TPE) in Python software. The performance of models was evaluated using the area under the receiver operating characteristic curve (AUC), F1 score, accuracy, sensitivity, and precision.

Results Of the 302 participants included, 175 (57.9%) were male and their median age (interquartile range) was 46 (34-61) years. Overall, 58.9% of participants had TPE, 15.9% had pleural metastasis and 25.2% had other causes of pleural effusion. Patients with TPE were younger, more often male and had a higher prevalence of HIV-infection. They also had higher pleural protein and CRP levels. The RF model showed the best performance with an AUC of 0.846 and an F1 score of 0.811 in the testing sample. Sensitivity was higher for MLP (0,944) and precision was higher for LR (0.806). Key predictors identified by the RF model were pleural CRP levels, age, pleural LDH levels, body mass index, and pleural protein levels.

Conclusion The RF model had the best performance. MLP and LR had the best sensitivity and precision respectively. The models can be used to improve diagnosis of TPE in Africa settings.

What is already known on this topic Tuberculous pleural effusion (TPE) diagnosis is invasive and lacks sensitivity in resource-limited African settings, with limited ML models available.

What this study adds This study developed and evaluated five ML models for TPE diagnosis using clinical and pleural fluid data from Africa setting. The Random Forest model outperformed others in diagnosing TPE, using pleural CRP levels, age, pleural LDH levels, body mass index, and pleural protein levels as key predictors.

How this study might affect research, practice, or policy Machine learning models can enhance TPE diagnosis in Africa using accessible and reliable biomarkers.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study involves human participants. The research was approved by the institutional ethical committee of the Faculty of Medicine and Biomedical Sciences, University of Yaounde 1 (reference 0043/2018)

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present study are available upon reasonable request to the authors

Comments (0)

No login
gif