Identification of bacteriophage DNA in human umbilical cord blood

Bacteriophage DNA is present in human umbilical cord blood. To test the hypothesis that the fetus is exposed to bacteriophage signals, we first searched for 474,031 bacteriophage sequences from 5 phage databases in the cfDNA of 10 paired samples of maternal and umbilical cord plasma (Figure 1A), which were part of the Placental Origins of Preeclampsia (POPE) study. Among the 10 pairs, six were from normotensive pregnancies and the remaining four were affected by preeclampsia. The clinical characteristics of these samples are presented in Table 1.

Bacteriophage DNA in umbilical cord blood in POPE dataset.Figure 1

Bacteriophage DNA in umbilical cord blood in POPE dataset. (A) Study design, data processing, and analysis. (B) Bar graph with the number of phage sequences identified from each phage database among all POPE samples. (C) Bar graph of 10 maternal-fetal pairs plotting the number of phage sequences identified only in maternal samples (red), only in fetal samples (blue), or in both samples from each pair (purple). Numbers below represent the number of phage sequences in each category. (D) As in C, with prevalent phage sequences removed. (E) Histogram depicting the number of samples in which each phage sequence was present. Red number indicates the number of phage sequences in each category. CE represent the combined results from the 5 phage databases.

Table 1

Clinical characteristics of POPE dataset (N = 10)

To identify bacteriophage sequences within the metagenomic sequencing data, we applied a phage annotation pipeline using previously published methods, as outlined in Figure 1A. In brief, raw data was quality controlled and trimmed using standard bioinformatics tools, including FASTQC and Trimmomatic. Human reads were subtracted by mapping to the human reference genome GRCh38 via Bowtie2. We then performed an NCBI BLAST search using a Curated Phage Database (CPD) of 26,159 annotated bacteriophage sequences from the NCBI collection (48) and four additional databases of largely unknown and uncharacterized bacteriophage sequences derived from studies of the human gut (Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/jci.insight.183123DS1) (42, 5254). These include 142,809 bacteriophage sequences identified in human gut metagenomes from across the globe (GPD) (42); 33,242 viral sequences pooled from 1,986 individuals across 33 datasets of human gut studies (GVD) (52); 82,141 viral sequences from infant gut samples (ELGV) (53); and 189,680 viral sequences from 11,810 stool samples (MGV) (54). Reads with significant hits were subjected to a secondary and more stringent human sequence removal, in which all reads with significant BLAST hits to any human sequence deposited in the NCBI nuccore were removed. This was followed by a final BLAST alignment to each of the five databases, allowing reads to map to up to 20 phage genomes per database. Reads that mapped to multiple phage sequences were assigned to the phage that had the highest total number of mapping reads within a database. We counted only those phage sequences with at least 10 unique reads covering at least 500 bp of their genome.

Using these criteria, phage sequences were identified from each of the 5 databases in the POPE samples (Figure 1B and Supplemental Table 2). We identified bacteriophage sequences in all samples from the POPE dataset, with 94 sequences identified across all samples (Figure 1C). In the cord blood samples, 55 phage sequences were identified, and 41 of these were also identified in maternal samples. Seven phage sequences were present in all 20 samples, which could represent phage sequences commonly circulating in blood or contaminants from sample collection or processing (Supplemental Table 3); however, we did not identify any phage sequences meeting our bioinformatic criteria in nuclease free water or PBS samples that were processed via the same methods used here (48). All 7 phage sequences were originally identified in human gut samples and were part of the GPD or GVD databases. To determine the impact of these prevalent phage sequences on the phage signal in our samples, we reanalyzed the data excluding all phage sequences present in more than 75% of our samples. Bacteriophage sequences were identified in all samples, and 16 of 20 samples had 4 or more phage sequences (Figure 1D). No phage sequences were specifically enriched or depleted in preeclamptic samples (Supplemental Table 3).

These data indicate that phage DNA is a common constituent of both maternal and fetal circulations, which is transmitted prenatally in humans from mother to fetus.

Phage sequences overlap in maternal-infant dyads from the POPE dataset. We then asked whether phage sequences were present across maternal-infant dyads (samples from the same pregnancy). We observed that 44 of the 94 phage sequences identified in the POPE dataset were present in more than 1 sample and 29 of these were present across at least 1 maternal-infant dyad (Supplemental Table 3).

Across all dyads, we identified phage sequences that were present in a maternal sample but absent in the paired cord blood sample or present in the cord blood but absent in the paired maternal sample (Figure 1, C and D). We found that 9 of 10 dyads had at least 1 shared phage sequence, even after excluding the potential contaminants (Figure 1, C and D). The specific phage sequences that were found in 1 dyad were often found in other dyads (Supplemental Table 4).

We next set out to determine if phage sequences were more likely to match across maternal-infant dyads compared with unrelated samples. Many phage sequences were either rare (present in only 1 sample) or very common and could not be used to discriminate pairings. Thus, we focused on phage sequences that were identified in exactly 2 samples from the POPE dataset and determined the frequency of dyads among these pairs. Ten phage sequences were present in exactly 2 samples from the POPE dataset and 3 of the 10 sample pairs associated with these sequences were dyads compared with 0.5 out of 10 pairs expected to be dyads by random chance, though this was not statistically significant due to the low number of phage sequences identified in exactly 2 samples (Supplemental Figure 1A).

Together, these data indicate that phage DNA in the fetal and maternal circulations may be related to each other, but, given the low coverage and limited sampling of the total phage diversity captured across the five databases, we were not able to detect a strong pattern.

Unique phage sequences in the POPE dataset. We next assessed the heterogeneity of the phage sequences in these samples using the presence of unique phage sequences as a read out. To this end, we counted the number of samples containing each phage sequence. Fifty of the 94 identified phage sequences were present in only 1 sample each (Figure 1E). These 50 unique phage sequences were distributed across 6 cord blood samples and 6 maternal samples, with 1–22 unique phage sequences per sample.

These data support the idea that the phageome is individual specific (52, 5558) and that this individualization of phage populations starts at the earliest points of phage exposure.

Bacteriophage DNA is present in umbilical cord blood from a second cohort. To validate our findings, we searched for bacteriophage DNA in a second dataset of cfDNA from paired maternal and umbilical cord samples originally reported by Witt et al. (59). This dataset contained 89 maternal samples and 111 fetal samples. Twenty-one of the umbilical cord blood samples were identified by the original study authors as likely contaminated based on bacterial sequencing, so these samples were excluded from our analysis (Figure 2A). The remaining samples contained 62 pairs of maternal and umbilical cord blood samples from maternal-infant dyads. Many of these samples came from deliveries affected by chorioamnionitis (14 pairs) or from preterm deliveries (29 pairs). To analyze these data from the Witt dataset, we used the same methods and criteria detailed above for the POPE dataset.

Bacteriophage DNA in umbilical cord blood in Witt dataset.Figure 2

Bacteriophage DNA in umbilical cord blood in Witt dataset. (A) Description of the cohort from the Witt dataset. (B) Bar graph with the number of phage sequences identified from each phage database among all Witt samples. (C) Bar graph of the number of phage sequences (red) present in each sample. (D) Bar graph of maternal-fetal pairs in which both samples contained phage sequences. Plot depicts the number of phage sequences identified only in maternal samples (red), only in fetal samples (blue), or in both samples from each pair (purple). Numbers above represent the number of phage sequences in each category. (E) Histogram depicting the number of samples in which each phage sequence was present. Red number indicates the number of phage sequences in each category. (F) Venn diagram depicting the overlap in the phage sequences identified in the POPE and Witt cohorts. CF represent the combined results from the 5 phage databases.

We identified 596 phage sequences from the 5 phage databases across all samples of the Witt dataset (Figure 2B). In the cord blood, 581 phage sequences were identified, and 42 of these were also present in maternal samples (Supplemental Table 5). In this cohort, we found no phage sequences present in greater than 75% of samples. No phage sequences were identified in 50 of 124 samples (Figure 2C). Of the remaining 74 samples, 41 contained 4 or more phage sequences. Phage sequences identified in more than 7 samples were classified by their presence in term, preterm, chorioamnionitis-unaffected and chorioamnionitis-affected samples (Supplemental Table 6). Escherichia virus lambda, elgv_52793, and GVD_30371 sequences appeared to be more prevalent in term than preterm samples. There were 4 phage sequences, MGV-GENOME-0336697, elgv_9621, uvig_539815, and uvig_579209, that were not found in any chorioamnionitis-affected samples but were present in 4 or 5 samples that were chorioamnionitis unaffected.

These data indicate that phage DNA is a common constituent of fetal circulations. The enrichment of phage sequences in term and chorioamnionitis-unaffected samples could indicate a protective effect of these phages or their bacterial hosts against preterm labor or chorioamnionitis. Alternatively, the phages or bacterial hosts associated with the term-enriched sequences may be more prevalent later in pregnancy.

Phage sequences overlap in maternal-infant dyads in the Witt dataset. Again, we assessed the overlap of phage sequences across maternal-infant dyads. Fifty-seven phage sequences were identified in more than 1 sample from the Witt dataset, and 33 of these were found across at least 1 maternal-infant dyad (Supplemental Table 5). Among the dyads with bacteriophage DNA present in both maternal and fetal samples, all but 1 dyad had at least 1 shared phage sequence from 1 of the 5 databases (Figure 2D). Eight phage sequences were identified across 4 or more dyads (Supplemental Tables 5 and 7).

To determine whether phage sequences were more likely to be shared between maternal-infant dyads than expected by chance, we again specifically examined the 28 phage sequences that were present across exactly 2 samples from the Witt cohort. We found 11 of the 28 sample pairs associated with these sequences were maternal-infant dyads compared with 0.2 out of 28 pairs expected to be dyads by random chance (P value < 0.001, Supplemental Figure 1B).

These data indicate that the overlap in phage sequences across maternal-infant dyads is unlikely to be due to random chance.

Unique phage sequences in the Witt dataset. We found that 539 of the 596 identified phage sequences were present in a single sample (Figure 2E). Of these, 494 were identified in 1 of 2 cord blood samples. Nine cord blood samples and 10 maternal samples contained unique phage sequences, with 1–337 unique phage sequences per sample.

Common phage sequences identified across samples and cohorts. We identified 21 phage sequences in both cohorts (Table 2 and Figure 2F). The most common phage sequences identified were Lambda phage from Escherichia, AcaML1 from Acidithiobacillus, phiDP10.3 from Dickeya, and the uncharacterized gut phages from the GPD (“uvig_578591”, “uvig_576852”, “uvig_461583”, “uvig_315137”), GVD (“GVD_30371”, “GVD_5336”), and ELGV (“elgv_52793”) databases.

Table 2

Phage sequences present in both POPE and Witt datasets

These data indicate that certain phage sequences are prevalent in maternal and fetal circulations.

Limited characterization of phage sequences identified in cfDNA. To characterize the phage sequences identified in the POPE and Witt samples, we extracted the known and predicted morphologies and host bacteria genera from the metadata of the published phage databases. Less than 20% of phage sequences identified had associated morphology or host bacteria predictions in their parent databases. The most common known or predicted morphologies were siphoviridae (14 phage sequences) in the POPE dataset, and myoviridae (45 phage sequences), siphoviridae (31 phage sequences), and podoviridae (17 phage sequences) in the Witt dataset.

The most common known or predicted bacterial host genera of identified phage sequences were Escherichia (55 phage sequences) and Streptococcus (45 phage sequences) (Table 3). Many of the known or predicted bacterial host genera are human gut inhabitants, such as Escherichia, Enterobacter, Campylobacter, Listeria, and Klebsiella. In addition, we identified 513 phage sequences with unknown bacterial hosts, but which were originally identified in gut samples. In addition to these gut-associated phage sequences, we identified 67 phage sequences with known or predicted bacterial hosts from Streptococcus, Megasphaera, Prevotella, Bifidobacterium, Roseburia, Faecalibacterium, Clostridium, Ruminococcus, and Lactobacillus, bacterial genera that are also commonly found in the vaginal tract (6062). Of these 67 phage sequences, 62 appeared exclusively in cord blood samples. Additionally, we identified 2 prevalent phage sequences, Acidithiobacillus phage AcaML1 and Dickeya phage phi10.3, which were associated with host bacteria that have no known human habitat.

Table 3

Host analysis of phage sequences identified across 2 cohorts

These data indicate that many circulating phage sequences are likely derived from the gut, oral cavity, or vaginal tract, and that the source of the phage sequences in the maternal and fetal circulations may be different.

Comments (0)

No login
gif