Temporal dynamics of gut microbiota and virome in preterm infants: insights from longitudinal metagenomic analysis

Abstract

Introduction:

Preterm infants exhibit heightened vulnerability to morbidity and mortality due to their underdeveloped immune systems and immature gastrointestinal tract. The gut microbiota plays a pivotal role in neonatal health, yet its establishment is influenced by multiple factors, including prematurity, antibiotic exposure, and feeding modalities. This study aimed to examine the interactions among gut bacteriophages, bacterial communities, and clinical variables in preterm infants to identify potential microbial biomarkers associated with health outcomes.

Methods:

We employed metagenomic shotgun sequencing and co-occurrence network analysis to characterize the virome and bacterial communities in 12 preterm neonates at 14 and 28 days post-birth. This approach enabled the identification of dynamic microbial colonization patterns and key bacterial species and bacteriophages associated with clinical parameters.

Results:

Staphylococcus epidermidis exhibited a significant decline over time, whereas Enterococcus faecalis and its associated bacteriophages showed progressive enrichment, becoming predominant by day 28. In contrast, the relative abundances of Clostridioides difficile and Klebsiella pneumoniae remained statistically stable between the two time points (14 vs. 28 days).

Discussion:

These findings suggest that microbial changes during the first month of life may reflect a combination of host developmental processes and external influences, such as antibiotic exposure or delivery mode. The observed microbial signatures provide preliminary insights into early gut microbiota and virome development in preterm infants. However, their functional relevance and long-term stability require confirmation in larger, well-powered longitudinal studies with denser temporal sampling. The enrichment of Enterococcus faecalis may indicate its opportunistic colonization potential in the preterm gut and warrants further investigation regarding its role in gut homeostasis and immune system maturation.

Introduction

Preterm birth, defined as delivery prior to 37 completed weeks of gestation, constitutes approximately 11% of global live births (Bradley et al., 2025). These infants exhibit heightened vulnerability to health complications due to immature organ systems, particularly underdeveloped immune and gastrointestinal (GI) functions, which predispose them to infectious diseases and other morbidities. The gut microbiota, a complex ecosystem of symbiotic microorganisms, exerts critical regulatory roles in neonatal development through mechanisms including supporting nutrient absorption, immune function, and pathogen resistance (Rutayisire et al., 2016; Saturio et al., 2021). Despite the growing body of evidence that highlights the crucial role of the gut microbiota in the health and development of preterm infants (Beghetti et al., 2023; Groer et al., 2014), a significant gap remains in our understanding of the temporal evolution of both the bacterial and viral components within these microbial communities during the early postnatal period. The viral component, known as the virome, along with the bacterial microbiota, plays an essential role in modulating the gut microbial ecosystem (Zhang et al., 2023). Elucidating how these components evolve over time is critical for understanding the impact of preterm infants on health outcomes. Recent advancements in metagenomic sequencing technologies have revolutionized the field of microbiome research (Inglis and Edwards, 2022; Yen and Johnson, 2021). These cutting-edge techniques allow for the detailed and comprehensive profiling of microbial communities, providing unprecedented insights into their composition, structure, and dynamics (Morniroli et al., 2023; Chen et al., 2022). By leveraging these technologies, researchers can now explore the complex interactions between bacterial and viral populations within the gut microbiota, shedding light on how these interactions influence the development and health of preterm infants.

In this study, we conducted a comprehensive longitudinal investigation of the gut microbiota and virome in a cohort of 12 preterm infants, with samples systematically collected at 14 and 28 days post-birth. By leveraging advanced high-throughput sequencing technologies, we characterized the dynamic shifts in microbial colonization patterns during this critical early developmental window. This analysis aimed to identify key factors governing the assembly and evolutionary trajectories of microbial communities over time, thereby elucidating the underlying temporal dynamics. Our research sought to determine how perinatal and neonatal variables, such as antibiotic exposure, feeding modalities, and delivery mode, which modulate the compositional dynamics and succession patterns of bacterial and viral populations in the gut. The findings demonstrate potential to inform the development of precision interventions targeting the modulation of early-life gut microbiota. Through elucidating core determinants of microbial community assembly and stability, we can design strategies to foster a resilient, health-promoting microbiota, thereby optimizing clinical outcomes in this high-risk population. These interventions could encompass precision probiotic therapies, personalized nutritional approaches, and novel therapeutic modalities aimed at enhancing gut microbiota functionality and resilience in preterm infants. Finally, our work contributes to the broader objective of refining healthcare for preterm infants by deepening insights into microbial determinants of development. Through translational application of our findings, we aim to refine evidence-based practices and improve short-term and long-term health and developmental trajectories for these infants, who face elevated risks of complications due to premature birth.

ResultsData summary

We integrated 24 metagenomic and viral metagenomic sequencing datasets from preterm infants collected at two postnatal time points (12 samples at 14 days post-birth [Group A] and 12 samples at 28 days post-birth [Group B]; Table 1). Sequencing metadata for metagenomic and viral metagenomic datasets are summarized in Supplementary Tables S1, S2, respectively.

Individuals situationSampleHJYMJXHJNLJYZhKZZhKRLYYZhHLZhZhZHXZHPYLmaternal characteristicMother_age(years)303636392828333437263330Antenatal corticosteroidyesyesyesyesyesyesnoyesyesyesyesnoAntenatal magnesium_sulfateyesyesyesyesyesyesnoyesyesyesyesyesAntibiotic use within 24_hours prior_to_deliveryyesyesyesyesyesyesyesyesyesyesyesyeschorioamnionitisnononodefinitesubclinicalsubclinicaldefinitenodefinitedefinitesubclinicaldefiniteMembrane breaking time(h)00029880000011assisted reproductionyesnoyesyesnonoyesnononoyesnomulticellular_conditiondichorionic diamnioticnodichorionic diamnioticdichorionic diamnioticmonochorionic diamnioticmonochorionic diamnioticnononodichorionic diamnioticdichorionic diamnioticnodelivery_waycesareancesareancesareancesareancesareancesareancesareancesareancesareanvaginalcesareancesareanneonatal characteristicsGenderFemaleFemaleFemaleFemaleMaleMaleFemaleMaleMaleMaleFemaleFemalegestational_age(week)3027+130+330+230+430+427+630+229+224+527+231+6Birth_Weight(g)12107007901440154014209501320135066010801010Admission_temperature(°C)3534.834.535.435.835.535.736.236.836.236.936.3Auxiliary_scores_for_Apgar_scoring_1min_score522355413323Auxiliary_scores_for_Apgar_scoring_5min_score535555332232Umbilical_artery_blood_PH7.337.177.217.147.277.31777.135777Apgar_score_1min1087910948841010Apgar_score_5min1010101010108101091010neonatal_asphyxianonomildnononomildnonomildnonoSmaller_than_gestational_agenonoyesnonononononononoyesrespiratory_support_days10.297136.922015.257.5437.382122.6767.4238.233.71Number_of_days_of_intubation4859603226316934404221umbilical_vein_cannulationnonononononoyesyesyesyesyesnoDuration_of_microfeeding261032214561032Initiation time of oral feeding_days post-birth31.019456.133360.951422.602119.672228.538956.414630.00970.0618179.144446.204927.716Type_of_initiation_of_oral_feedingformula milkformula milkformula milkformula milkformula milkformula milkformula milkformula milkformula milkmother's milkdeeply hydrolyzed formula milkmother's milkTypes_of_antibiotics_after_birth_4weeks242321122411Number_of_days_on_antibiotics_after_birth_4weeks131611141512261392474Frequency of defecation_in_the first three days5223222631212neonatal major morbidity and discharge outcomeBPDnoyesyesnonononononoyesyesnosepticaemianononoyesyesnononononononoROPyesyesyesnoyesyesyesnonoyesyesnoIVHnonononononoyesnonononoyesNECnononononononoyesnonononoSurvival without major comorbiditiesyesyesyesyesyesyesnoyesyesnoyesyesoutcomeSurvivalSurvivalSurvivalSurvivalSurvivalSurvivalSurvivalSurvivalSurvivalSurvivalSurvivalSurvival

After removal of low-quality reads (see Methods), we obtained a total of 4.1 billion high-quality reads, with an average retention rate of 97.50% for metagenomic data and 89.52% for viral metagenomic data. Potential human DNA contamination was controlled by mapping reads to the human reference genome and discarding host-derived sequences. Following an additional round of host-filtration with BMTagger (Rotmistrovsky and Agarwala, 2011), the metagenomic datasets retained 1.7 billion host-clean reads (average 74 million reads per sample), whereas viral metagenomic datasets yielded 240 million host-clean reads (average 1 million reads per sample). Detailed statistics of host sequence removal for both datasets are provided in Supplementary Table S3.

Each sample was assembled individually to generate sample-specific contigs (see Methods). Assembly statistics for viral metagenomes, including contig number, total assembly length, N50, and maximum contig size, are summarized in Supplementary Table S4.

Analysis of microbial community structure

To characterize the taxonomic composition of the intestinal microbiome in preterm infants, we annotated metagenomic reads using Kraken and Bracken (Lu et al., 2017). In addition, to thoroughly examine the impact of viruses on premature infants, we analyzed the composition of the viral populations in both groups using CAT (von Meijenfeldt et al., 2019) based on sequencing reads (Figure 1B). The top 150 viruses, ranked by community abundance, revealed that the annotated virome was overwhelmingly composed of viruses belonging to the phylum Uroviricota, which encompasses tailed bacteriophages of the order Caudovirales. Viral relative abundances reported in this study are expressed as proportions of the total quality-controlled sequencing reads per sample, ensuring that estimates reflect the true compositional context of the gut virome.

Composite scientific figure with two circular cladograms at the top (A, B), each displaying phylogenetic relationships among microbial taxa or phages, and four stacked bar charts (C-F) below, showing relative abundances of the top thirty species across multiple sample groups with labeled legends.

Microbial composition of preterm infant gut communities. (A) Composition of the species hierarchy of group bacterial communities. (B) Composition of the species hierarchy of group virus communities. Circular cladogram representing the phylogenetic tree of bacterial taxa. The different colors represent bacterial phyla, with each segment within the colored area indicating lower taxonomic levels such as genus and species. The size of the circles corresponds to the relative abundance of each taxon. Rings depict hierarchical taxonomic levels, moving from phylum (outer ring) to genus/species (inner rings). (C) Histogram of the composition of the species hierarchical structure of the sample bacterial community. (D) Histogram of the composition of the species hierarchical structure of the sample virus community. (E) Histogram of the composition of the species hierarchical structure of the group bacterial community. (F) Histogram of the composition of the species hierarchical structure of the group virus community.

Viral relative abundances reported in this study are expressed as proportions of the total quality-controlled sequencing reads per sample, ensuring that estimates reflect the true compositional context of the gut virome. Due to limitations in current viral reference databases, a variable fraction of viral reads could not be taxonomically assigned across samples. While exact per-sample unclassified proportions were not retained in the final analysis tables, our intermediate processing logs indicate that annotation success ranged widely—approximately 40% to over 90% of viral reads were classified in different individuals. This implies that 10% to 60% of viral sequences remained uncharacterized, likely representing novel or underrepresented viral lineages in existing databases. This implies that 10% to 60% of viral sequences remained uncharacterized, likely representing novel or underrepresented viral lineages in existing databases. Despite this variability, all identified viruses belonged to the phylum Uroviricota (100% prevalence; median relative abundance: 0.14%), which encompasses tailed bacteriophages of the order Caudovirales. All taxonomically resolved viral sequences corresponded to bacteriophages infecting bacterial hosts such as Staphylococcus, Streptococcus, Enterococcus, and Enterobacter. Key species identified included Staphylococcus phage StB20, Staphylococcus phage StB20-like, Streptococcus phage EJ-1, Enterococcus virus FL3, and Streptococcus phage YMC-2011. The phylum-level prevalence and median relative abundances reported here were computed from the taxonomic profiles in Supplementary Tables S5 (bacteria) and S6 (viruses). However, a portion of viral reads remained unidentified due to limitations in the reference databases, which primarily captured well-characterized viral species. This issue highlights the need for further expansion of viral reference databases, which will be addressed in future studies. These findings highlight the rich and varied microbial landscape within the intestinal microbiota of preterm infants, providing a foundation for future investigations into potential relationships between gut microbiota composition and clinical outcomes such as lung health.

The comparison of bacterial species composition between the 14- and 28-day groups revealed distinct distributions. The relative abundance of Staphylococcus epidermidis was significantly higher at 14 days (Paired t-test, P < 0.05), while Enterococcus faecalis was significantly higher at 28 days (Paired t-test, P < 0.05, Figure 1C; Supplementary Table S5). To provide a more transparent view of individual microbiome profiles, relative abundance profiles for each individual were displayed separately (Figures 1C, D). These individual profiles allow for a clearer understanding of the variations in microbial composition across different infants, showing that the abundance of species such as Staphylococcus epidermidis varied from negligible to over 50%, with some infants exhibiting more marked shifts in bacterial populations between the two time points. In terms of viral community relative abundance, there was a notable decrease in Staphylococcus phages from 14 to 28 days, whereas Enterococcus phages showed a general increase (Figures 1E, F; Supplementary Table S6). Streptococcus phages, which target and control Streptococcus bacteria, play a crucial role in maintaining a balanced gut microbiota by preventing bacterial overgrowth. These findings highlight the rich and varied microbial landscape within the intestinal microbiota of the study subjects, providing a robust foundation for further investigation into the role of specific microbial taxa in the health and disease states of preterm infants. This detailed microbial profiling is essential for understanding the intricate microbial ecosystems and their potential impacts on the clinical outcomes of this vulnerable population.

Alterations in the diversity of communities of microbes

Analysis revealed distinct differences in bacterial and viral community composition between 14- and 28-day postnatal time points. Alpha diversity, assessed using Shannon and Simpson indices, was significantly higher in bacterial communities at 28 days compared to 14 days (Shannon index: paired t-test, P < 0.05; Figure 2A; Simpson index: paired t-test, P < 0.05; Supplementary Figure 1A). Viral communities exhibited a non-significant increasing trend in alpha diversity at the family and species levels (Shannon index: paired t-test, P > 0.05; Figure 2B; Simpson index: paired t-test, P > 0.05; Supplementary Figure 1B). Beta diversity analyses based on Bray-Curtis dissimilarity and weighted UniFrac distances did not reveal statistically significant differences between the two time points for bacterial communities (Supplementary Figure 2) or viral communities (Supplementary Figure 3). Principal Component Analysis (PCA) on species-level relative abundances highlighted structural differences in both bacterial and viral communities between groups, as evidenced by distinct clustering patterns of samples (Figures 2C, D). Overall, microbial community composition underwent significant shifts from 14 to 28 days post-birth (Supplementary Tables S6, S7).

Figure contains four panels comparing microbial diversity between two groups labeled A and B. Panels A and B show boxplots for alpha diversity metrics at six taxonomic levels, with p-values above each comparison; red represents group A and blue group B. Panels C and D are PCA scatter plots of samples from groups A and B, with ellipses indicating distribution, axes labeled with principal component percentages, and points connected to group centroids.

Alterations in the diversity of communities of microbes. (A) Bacterial structural Alpha diversity differences between groups; (B) Virus structural Alpha diversity differences between groups, The differences were assessed using the Shannon-paired T-test. (C) PCA analysis of bacterial community composition based on species; (D) PCA analysis of virus community composition based on species. For all the boxplots, the horizontal lines inside the boxes show the medians. Box bounds show the lower quartile (Q1, the 25th percentile) and the upper quartile (Q3, the 75th percentile). Whiskers are minima (Q1 − 1.5× IQR) and maxima (Q3 + 1.5× IQR), where IQR is the interquartile range (Q3–Q1).The error bars are given based on standard deviation of the mean (± SD).

Analysis of microbial composition relative abundance between groups

To examine the distribution patterns and relative abundance of microbial species, we quantified shared and unique bacterial and viral species across inter-group and intra-group comparisons in this study (Figure 3). The bacterial species flower plot revealed that Group A contained 22 shared species, with approximately 41.67% of samples lacking unique species, 33.33% harboring fewer than 5 unique species, and only 25% exhibiting more than 5 unique species. In contrast, Group B exhibited 43 shared species, wh

Comments (0)

No login
gif