In 2024, approximately 30% of U.S. adolescents reported having consumed alcohol at least once in their lifetime, with about 25% of these individuals engaging in binge drinking. Adolescent alcohol use is associated with neurodevelopmental impairments, elevated risk of later alcohol use, and mental health disorders. These findings underscore the importance of identifying the variables driving adolescent alcohol use and leveraging them for early identification and targeted intervention. Previous studies have typically developed machine-learning classification models that use neuroimaging data in combination with limited clinical measurements. Neuroimaging data are expensive and difficult to obtain at scale, whereas clinical measures are more practical for large-scale screening due to their low cost and widespread accessibility. However, clinical-only approaches for alcohol drinking classification remain largely underexplored. Furthermore, prior studies have often focused on adults, limiting generalizability to the broader adolescent population. Additionally, confounding factors such as age and substance use, which are strongly correlated with alcohol consumption, have often been inadequately addressed, potentially inflating classification performance. Finally, class imbalance remains a persistent challenge, with prior attempts yielding only limited improvements. To address these limitations, we propose FocalTab, a framework that integrates TabPFN with focal loss for robust generalization and effective mitigation of class imbalance. The approach also incorporates an initial preprocessing step to remove confounding factors to account for age and substance-use. We compare FocalTab against state-of-the-art methods across different variable selections and dataset settings. FocalTab achieves the highest accuracy (84.3%) and specificity (80.0%) in the most stringent setting, in which both age and substance use variables were excluded, whereas competing models drop to near-chance specificity (12-24%). We further applied SHapley Additive exPlanations (SHAP) analysis to identify key clinical predictors of drinker classification, supporting enhanced screening and early intervention.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementResearch reported in this publication was supported by the U.S. National Science Foundation under Award Number 2500836, and the Office of the Director, National Institutes of Health of the National Institutes of Health under Award Number R03OD038391. This work was also partially supported by the National Institute of General Medical Sciences (NIGMS) under Award Numbers P20GM103427 and U54GM115458, and by the National Institute on Alcohol Abuse and Alcoholism (NIAAA) under Award Numbers R01 AA029127, P60 AA031124, F32 AA032170, and L30 AA032656, P50AA030407 5126 (Pilot Core grant). This study was in part financially supported by the Child Health Research Institute at UNMC/Children's Nebraska. This work was also partially supported by Nebraska EPSCoR FIRST Award (OIA 2044049). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding organizations.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The dataset is a public accessible dataset from NIH. All the individual-level information has been de-identified when we get access to the dataset.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityAll data produced in the present study are available upon reasonable request to the authors
Comments (0)