Objective This systematic review aimed to assess the diagnostic accuracy of algorithms used to identify rheumatoid arthritis (RA) and juvenile idiopathic arthritis (JIA) in electronic health records (EHRs).
Methods We searched MEDLINE, Embase, and CENTRAL databases and included studies that validated case definitions against a reference standard such as rheumatologist-confirmed diagnosis or ACR/EULAR classification criteria. Title/abstract screening, full-text review, data extraction and quality assessment were all completed in duplicate. Results were synthesised narratively and using a bivariate random-effects meta-analysis of sensitivity and specificity.
Results A total of 35 studies were included. Algorithms varied widely in complexity, ranging from single ICD codes to combinations including disease-modifying antirheumatic drugs (DMARDs), hospitalisation records, and specialist diagnosis. Algorithms combining ICD codes with DMARD prescriptions (pooled sensitivity= 0.79 95% CI 0.61-0.90, specificity= 0.96 95% CI 0.72-1.00, PPV= 0.78 95% CI 0.63-0.88) or requiring an ICD code assigned by a rheumatologist (pooled sensitivity= 0.91 95% CI 0.70-0.98, specificity= 0.94 95% CI 0.49-1.00, PPV= 0.70 95% CI 0.64-0.75) showed the highest accuracy, with balanced sensitivity, specificity, and positive predictive value (PPV). Less restrictive algorithms demonstrated high sensitivity but lower PPV. Substantial heterogeneity was observed across studies, likely due to differences in algorithm structure, data sources, and validation methods. Despite this variability, we used conceptually coherent categories to allow for meaningful synthesis, prioritising clinical interpretability.
Conclusions These findings support the use of more specific algorithms when diagnostic certainty is essential and highlight the need for further validation of high-performing algorithms across diverse healthcare systems.
Significance and Innovations
▪ This is the first comprehensive systematic review to evaluate and synthesize the accuracy of algorithms used to identify rheumatoid arthritis and juvenile idiopathic arthritis in electronic health records (EHRs), addressing a growing need as real-world data become increasingly central in rheumatology research.
▪ The findings provide critical guidance for researchers and clinicians on the strengths and limitations of commonly used case definitions, helping improve validity of studies using administrative or EHR data.
▪ By categorizing algorithms based on their components and reference standards, this review offers a practical framework for selecting the most appropriate algorithm depending on the study purpose and data source.
▪ The review highlight gaps in validation efforts and emphasizes the need to validate high-performing algorithms across diverse healthcare settings and evolving coding systems, ensuring accurate disease identification in current and future research.
Competing Interest StatementThe authors have declared no competing interest.
Clinical Protocolshttps://www.crd.york.ac.uk/PROSPERO/view/CRD420251056943
Funding StatementThis research was funded by the Health Research Board Applied Programme Grant Awards (APRO-2023-028) scheme.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
FootnotesGrant/Financial support This research was funded by the Health Research Board Applied Programme Grant Awards (APRO-2023-028) scheme.
Declaration of interests Authors declare no potential conflict of interests.
Data AvailabilityThis research is based on published literature. All data used in this research are already included in the article or the supplementary material. The data extraction template is available at the Open Science Framework: OSF | Data Extraction Template.xlsx.
Comments (0)