Personalized Synthetic Electrocardiograms with Outcomes

Abstract

Background Synthetic data can be the solution to privacy requirements, can enrich datasets limited by underrepresentation of certain subgroups/minorities, combat data shortage, and reduce annotation costs, to facilitate the development of data-hungry machine-learning applications. An important limitation of current synthetic data is the missing link to patient characteristics and outcomes. This metadata is essential for synthetic data to be useful for solving real-world problems, and we aimed to generate novel synthetic data with clinical characteristics and outcomes.

Methods We designed a novel generative method for generating 1:1 personalized synthetic electrocardiograms (ECGs) with associated patient characteristics and outcomes. The architecture of the model is a U-net neural network, which creates patient-specific ECGs to allow attachment of patient characteristics, comorbidities, and outcomes. We developed the model on the General Suburban Population Study and the Lolland-Falster Study cohorts and validated the model on the Danish Inter99 cohort. We compared original and synthetic ECGs usings Bland-Altman plots, by comparing associations with sex, age, and body mass index using linear models, and by comparing associations with all-cause mortality using Cox models. We defined that at most 5% of synthetic ECGs should have their paired original ECG as nearest neighbor in Euclidean space.

Results We generated 6,612 novel, personalized electrocardiograms. Although Bland-Altman plots showed a high level of agreement between synthetic and original ECGs, only 3.7% of synthetic ECGs had their original paired ECG as their nearest neighbor. Synthetic ECGs had preserved relations between heart rate, R-wave amplitude, and T-wave amplitude and age, sex, and body mass index. Corrected QT interval was about 9.5 ms shorter in men compared to women in both the original and synthetic cohorts. The associations between PR interval and clinical characteristics were attenuated in the synthetic cohort. Heart rate and corrected QT interval were each associated with increased mortality with similar hazard ratios in the original and synthetic populations.

Conclusions We demonstrated the ability of a novel neural network method to generate personalized synthetic ECGs with preserved associations with many patient characteristics and with all-cause mortality. This method may facilitate data sharing for the development of better risk prediction models.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

JLI is supported by a grant from the Danish Cardiovascular Academy (PD2Y-2023004-DCA). The Danish Cardiovascular Academy is funded by the Novo Nordisk Foundation and the Danish Heart Foundation, grant number NNF20SA0067242. MN is supported by a grant from the Danish Data Science Academy and the Danish Cardiovascular Academy (PhD2024015-DCA-DDSA). The Danish Data Science Academy is funded by the Novo Nordisk Foundation (NNF21SA0069429) and the VILLUM FONDEN (40516). DL is supported by a Novo Nordisk Foundation Young Investigator Awards 2021 (NNF21OC0066480). CE is partly funded from the Laboratory Endowment Fund at Boston Children?s Hospital, USA. GESUS was funded by the Region Zealand Research Foundation, Naestved Hospital Foundation, Naestved Municipality, Johan and Lise Boserup Foundation, TrygFonden, Johannes Fog's Foundation, Region Zealand, Naestved Hospital, The National Board of Health, The Local Government Denmark Foundation

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The present study used already collected data from population studies, which had each been approved by the relevant IRB and the use of the data was approved by the Danish Data Protection Agency. The data used in our study was de-identified individual-level data.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

We are prohibited by law from sharing the underlying patient data.

Comments (0)

No login
gif