Benchmarking MicrobIEM – a user-friendly tool for decontamination of microbiome sequencing data

Study design

An overview of the study design is shown in Fig. 1. We benchmarked six decontamination approaches and their parameters in three mock communities with different sample compositions, and in an environmental low-biomass microbiome dataset. Each mock dataset was available as a dilution series, where we used samples ranging from high bacterial input (108 cells) to low bacterial input (103 cells) in the benchmarking to mimic different microbial source environments. The chosen decontamination algorithms were either sample based (frequency filter, Decontam frequency filter) or control based (presence filter, Decontam prevalence filter, SourceTracker, and the ratio and span filter of our novel tool MicrobIEM). For each algorithm, we also tested tool-specific parameters for contaminant removal, which have to be selected by the user. To evaluate the success of each decontamination approach, we compared common test assessment scores for each dilution of the mock communities. Additionally, the effect of the decontamination algorithms on the sample composition and number of taxa and reads was investigated in an environmental low-biomass dataset.

Fig. 1figure 1

Overview of the decontamination benchmarking study design. Three mock datasets were used for decontamination benchmarking, one with an even, and two with a staggered community structure. Mock communities were available as dilution series covering a wide range of bacterial biomass per sample (108 to 5.55 × 103 bacterial cells). Two sample-based and five control-based decontamination algorithms were compared based on their classification performance into mock and contaminant reads, evaluated by Youden’s index and other evaluation scores. The same parameters and tools were also evaluated in a low-biomass environmental dataset from the skin. Additional information about the decontamination filters implemented in MicrobIEM can be found in Additional file 1: Supplementary Figure 1

Datasets for benchmarking

The datasets consisting of three mock communities and one environmental low-biomass microbiome dataset are described below and an overview is provided in Additional file 1: Supplementary table 1.

Even mock community

As an even mock community, we used the dataset of a previous decontamination benchmarking of Karstens et al. [20]. The complete data preparation methods can be found in the original publication. Briefly, the Zymobiomics D6300 mock community (https://www.zymoresearch.de/collections/zymobiomics-microbial-community-standards/products/zymobiomics-microbial-community-standard) consists of eight bacterial and two fungal species in an even composition in terms of total DNA, with proportions of cell numbers per bacterial species ranging from 6 to 22%. Of this mock community, a serial threefold dilution was prepared, from 1.5 × 109 to 2.3 × 105 input cells. Additionally, one pipeline negative control is available. The V4 region of the bacterial 16S rRNA gene was amplified and sequenced on an Illumina MiSeq® platform (Illumina Inc.). Reads were denoised using DADA2 [33] and annotated using the Silva database [34]. Mock ASVs were defined based on ASVs present in the undiluted sample, with sequences classified as mock that matched the expected 16S reference sequences by Zymobiomics exactly or with one nucleotide difference. All other ASVs in the undiluted sample differed substantially from any expected sequence, as described by Karstens et al. [20] (https://github.com/lakarstens/ControllingContaminants16S). All remaining non-mock ASVs present in the data were classified as contaminants. The final dataset consists of 1,675,028 reads (median 172,915 reads per mock sample, 189,779 reads in the negative control) in 1414 ASVs, of which 9 ASVs were classified as mock and 1405 as contaminants. The ASV table and contaminant classification were taken as submitted by the original authors on Github [https://github.com/lakarstens/ControllingContaminants16S].

Staggered mock community A

To also test decontamination approaches in a more realistic, uneven community structure [28,29,30], we created a staggered mock community called “A”, consisting of 15 strains that differ in their absolute cell counts by two orders of magnitude from 18 to 0.18% (Additional file 1: Supplementary table 2). From this staggered mock community, we prepared a serial tenfold dilution from 109 to 102 cells in three technical replicates per dilution. Additionally, three pipeline negative controls and three PCR controls were processed.

Mock preparation

Aerobic and anaerobic bacteria of the mock community were cultivated with the appropriate medium, temperature and oxygen supply at each of the following steps as summarized in Additional file 1: Supplementary Table 2. Pre-cultures were obtained by inoculating 3 mL of medium, which were then grown for 6 h. Next, 10 μL of the pre-culture was transferred to flasks (baffled flask if oxygen required for growth) containing either 100, 250 or 500 mL of the respective medium depending on their growth capacity. Overnight cultures were centrifuged for 10 min at 3000 × g and washed 3 times. Each culture was aliquoted into one part for storage at − 80 °C and one part for cell number determination. To ascertain the cell number, the optical density (OD) was measured and an OD600 of 1 was equated with 109 cells. Subsequently, a dilution series was prepared, and 50 μL of the dilutions expecting 1, 10, 100, and 1000 cells per plate was plated with a Drigalski spatula. Colony-forming units were counted to determine the cell number. Lastly, the required amount of the strains was mixed to obtain the desired cell number for each strain of the mock community with the composition as described in Additional file 1: Supplementary Table 2.

DNA extraction and sequencing

The microbial DNA of all samples and the three pipeline negative controls was extracted with the UCP Pathogen Kit (Qiagen) according to the manufacturer’s instructions in an elution volume of 80 μL. Cell lysis was performed in screw cap tubes containing the sample, 500 mg of 100 μm diameter zirconia-silica beads, 500 μL Stool Stabilizer (Stratec), and 650 μL of ATL buffer containing 4.3 μL DX buffer, with a Precellys Evolution device (Bertin) shaking twice for 90 s with a 15-s break. The V1–V3 variable region of the 16S rRNA gene was amplified using the primers 27F-YM (5'-AGAGTTTGATYMTGGCTCAG-3') and 534R (5'-ATTACCGCGGCTGCTGG-3') with the Q5 High-Fidelity PCR kit (New England Biolabs) with the following conditions: 98 °C for 10 s, 59 °C for 20 s, 72 °C for 15 s for 25 cycles. In a subsequent 8-cycles PCR reaction, barcodes for all samples and PCR controls were added. Indexed amplicons were purified using AMPure XP beads (Beckman Coulter) with a bead to DNA ratio of 0.7:1 (vol/vol), according to the manufacturer’s instructions. The purified amplicons were quantified with the fluorescent dye-based Qubit® dsDNA HS Assay Kit (Invitrogen) and all samples were pooled equimolarly. Sequencing was carried out on an Illumina MiSeq® platform (Illumina Inc.) using 2 × 300 bp paired-end reads at the Core Facility Microbiome at ZIEL, Institute for Food and Health, Freising, Germany.

Bioinformatic processing of samples

The sequences were denoised using DADA2 [33] with default parameters except truncLen = c(299,280), trimLeft = c(20,17), and maxEE = c(2,6) in the function filterAndTrim(), and annotated using RDP-based annotation formatted for DADA2 [35]. Sequences which differed at least 20% from the expected sequence length were removed as well as singletons. One sample was removed due to experimental failure (103 input cells, 11 reads), leading to a total of 23 mock samples for decontamination analysis. Mock ASVs were defined as sequences matching a 362-bp-long subsection of the V1–V3 region of the expected sequences based on Sanger sequencing of individual mock taxa. Since Sanger sequencing produces only one read-out even in the case of different 16S copy variants per species, we tolerated ambiguous base calls and additionally accepted ASVs differing by up to 4 bp (Levenshtein distance) from expected ASVs. The final dataset of the “staggered mock A” consists of 361,651 reads (median 13,747 reads per mock sample, median 1226 reads in the negative controls) in 293 ASVs, of which 52 ASVs were classified as mock and 241 as contaminants.

Staggered mock community B

To validate our analyses in a second realistic mock community with an uneven community structure, we used a subset of the dataset published by Rauer & de Tomassi et al. [36]. The complete experimental design and data preparation methods can be found in the corresponding manuscript. Briefly, this study compared eight extraction protocols in three mock communities. Here, we used only a subset of eight samples of the three-species spike-in mock community D6321 (https://www.zymoresearch.de/collections/zymobiomics-microbial-community-standards/products/zymobiomics-spike-in-control-ii-low-microbial-load), which were processed using the ZymoResearch extraction buffer. The undiluted samples with 1.1 × 105 bacterial input cells were 1:20 diluted to 5.55 × 103 bacterial input cells, with each of the two dilutions present in four replicates. These replicates underwent different extraction kits and protocols, but shared the same extraction buffer that was associated with low-level contamination. Additionally, four pipeline negative controls (using the same extraction buffer) and two PCR controls were processed along with the samples. The V1–V3 region of the 16S rRNA gene was sequenced using the Illumina MiSeq® platform (Illumina Inc.). Mock ASVs were defined as sequences with Levenshtein distance ≤ 4 to any of the three reference 16S genes provided by ZymoResearch. We selected this subset of samples due to the presence of cross-contaminating mock ASVs in the negative controls, and refer to the dataset here as “staggered mock community B”. The staggered mock B had 195,279 reads (median 25,282 per mock sample, median 6616 reads in the pipeline negative controls). Of all 221 ASVs, 9 ASVs were classified as mock and 212 as contaminants.

Skin microbiome dataset

As a low-biomass environmental microbiome dataset, we chose a skin microbiome dataset published by Hülpüsch et al. [37]. The skin is a low-biomass environment (103 to 107 bacteria [38, 39]) and is therefore representative for an environment susceptible to contaminants. Via qPCR, a bacterial colonization of 106 cells was determined as published by De Tomassi et al. [40]. The longitudinal study investigated the skin microbiome of six healthy individuals and six atopic eczema (atopic dermatitis, AD) patients over the course of 8 weeks. The study was approved by the ethics committee of the Technical University of Munich (187/17S). Complete methods for data generation can be found in the original publication. Briefly, Illumina MiSeq® sequencing (Illumina Inc.) of the V1–V3 16S rRNA gene was performed. In contrast to the previous publication, the sequences were now denoised using DADA2 [33] with default parameters except truncLen = c(299,280), trimLeft = c(20,17), and maxEE = c(2,6) in the function filterAndTrim(), and annotated using AnnotIEM [41]. In total, 209 samples and 12 pipeline negative controls were analyzed (sequencing depth of samples > 5000 reads). The skin microbiome dataset consists of 4,836,304 reads and 21,541 ASVs.

Contaminant removal approaches for benchmarking

In the following section, we describe in detail the chosen decontamination tools used for the benchmarking. As the blacklist approach does not consider sample-specific environments, it is not recommended [20] and was not considered in this benchmarking.

Frequency filter

A frequency filter removes all sequences from a dataset, which do not reach a certain threshold of relative abundance in any sample. The rationale behind this method is that singletons and sequences with low relative abundance often represent incorrect sequences like chimeras, so their removal from microbiome datasets decreases computational times and reduces differences between biological and technical replicates and is therefore advisable [42, 43]. There is no commonly accepted threshold, but values of 0.00005 (0.005%) [44] or 0.0025 (0.25%) [45] have been recommended.

Presence filter

The presence filter removes every sequence which appears in any negative control [20] without considering its abundance or taxonomic annotation.

SourceTracker

SourceTracker is a Bayesian approach to model the sources and proportions of contaminants in microbiome studies. It was published by Knights et al. in 2011 [27] and is available as an R script (version 1.0.1, https://github.com/danknights/sourcetracker). In contrast to other tools, SourceTracker classifies individual reads of a sequence, allowing a sequence to originate from different sources. Source environments can be the “unknown” biological sample, or defined external sources (e.g., laboratory bench, reagents). Thus, SourceTracker can use negative controls as a single external source environment, but works best with additional samples from the environment [27] or knowledge on the sampled communities [20]. SourceTracker has previously achieved great decontamination results when this additional knowledge on the sampled communities or additional types of controls were incorporated [20]. However, as this data is often not available in clinical microbiome studies, we decided to benchmark only the most often encountered scenario in microbiome research, where prior knowledge on expected and contaminant taxa is not reliably available. We therefore only use the pipeline negative controls as source environments per dataset here. The three parameters alpha1 (default 0.001), alpha2 (default 0.1), and beta (default 10) can be adapted to avoid overfitting of the data and increase the sensitivity of the algorithm.

Decontam

Decontam identifies contaminants by a sample-based and a control-based algorithm, which can be used individually or combined. The method is available as an R package (version 1.8.0) and has been published by Davis et al. in 2018 [25].

Decontam frequency filter

The Decontam frequency filter is based on the hypothesis that the relative abundance of contaminating sequences is inversely correlated with the total input material of a sample. Therefore, additional data on DNA concentration is required, such as measured via qPCR or fluorescence dye-based methods. The user-set threshold ranges from 0 to 1 (default 0.1) and can be increased to achieve a stricter contaminant removal.

Decontam prevalence filter

The Decontam prevalence filter is based on negative controls. For each sequence, the presence or absence in controls is compared to environmental samples. Again, the threshold ranges from 0 to 1 (default 0.1) and can be increased to achieve a stricter contaminant removal.

MicrobIEM

MicrobIEM is our novel tool implemented in the statistical software package R [46], providing decontamination, quality control and basic analysis of microbiome (amplicon sequencing) samples. The decontamination core algorithm is implemented as a single R function, and as a complete tool with graphical user interface and basic analysis options for researchers without coding experience. The complete code and user documentation are provided in the Github repository (https://github.com/LuiseRauer/MicrobIEM), and the tool can be used directly through a web browser without installation (https://env-med.shinyapps.io/microbiem/).

Decontamination with MicrobIEM ratio filter and span filter

In MicrobIEM, contaminant removal is based on negative controls (Additional file 1: Supplementary Figure 1). As identifying contaminant features solely by their presence in negative controls is not recommended [20], we developed two new concepts: (1) the ratio of the mean relative abundance of a sequence in negative controls versus in environmental samples, since a “systematic” contaminant (i.e., not sporadic, originating from, e.g., lab consumables) should appear in rather high relative abundance in the expectably empty negative controls (with contaminants defined as: (mean relative abundance of feature in control / mean relative abundance of feature in sample) > threshold), and (2) a span threshold measuring the proportion of negative control samples contaminated with this sequence, since “systematic” contaminants should not appear only sporadically in a small fraction of negative controls (with contaminants defined as (span in controls ≥ threshold)). Therefore, MicrobIEM’s span filter with a threshold of 1 (out of all available controls) is identical to the presence filter, while increasing the span threshold is intended to alleviate the effect of sporadic cross-contamination from samples into negative controls. The number of available thresholds for MicrobIEM’s span filter depends on the number of control samples per dataset. MicrobIEM’s two contamination filters can be applied independently for two types of controls (e.g., PCR controls and pipeline controls as in our dataset). In our benchmarking, we only test MicrobIEM’s two filters on pipeline negative controls, since barely any reads were detected in the available PCR controls.

Additional quality control & data analysis options in MicrobIEM

The decontamination algorithm, additional quality control options, and basic statistical analyses are implemented in a Shiny-based tool with a graphical user interface and interactive plots created with plotly [47, 48]. Thus, researchers without coding experience are able to inspect and explore microbiome data easily, fast, and interactively. An overview of the available filter steps and workflow is given in Additional file 1: Supplementary Figure 2. As input, a feature table with read counts (either ASV or operational taxonomic unit (OTU) table) and a metadata table with additional information per sample (e.g., gender, treatment) and a specification as “sample” or “NEG1”/“NEG2” (controls) is required. Quality control options in MicrobIEM (Additional file 1: Supplementary Figure 2B) include the removal of samples with a low number of total reads, as these samples often represent experimental failure due to technical problems in sampling or amplification [49]. Additionally, features with a low maximum abundance over all samples (e.g., singletons or doubletons) or with a low relative abundance over all samples (equivalent to the frequency filter) can be removed as they often represent spurious sequencing errors or contaminations [45]. All quality control steps can be used independently or in combination. Thresholds for these steps are defined by the user, but their impact on the data can be assessed with the interactive plots offered in MicrobIEM.

After quality control, the final filtered feature table can be explored and visualized interactively with the most common analyses for microbiome research, namely alpha diversity, beta diversity, and the distribution of microbial taxa (taxonomic composition). Two-sided statistical tests (Kruskal–Wallis and Permutational Multivariate Analysis of Variance, PERMANOVA) are provided for alpha and beta diversity analyses, respectively. More details on the statistical analysis in MicrobIEM are provided in the readme on Github (https://github.com/LuiseRauer/MicrobIEM).

Example dataset

For an initial exploration of the functions of MicrobIEM, we provide an artificial example dataset, consisting of a feature table (MicrobIEM_test-data_featurefile.txt) and a metadata table (MicrobIEM_test-data_metafile.txt). The dataset contains 29 microbiome samples that can be investigated by age group, gender, BMI, and medication dosage. Additionally, we provide 1 positive control, and 2 types of negative controls: 3 PCR controls (NEG1) and 2 pipeline controls (NEG2). The example datasets are deposited on Github (https://github.com/LuiseRauer/MicrobIEM/tree/main/MicrobIEM/test-data) and on the Open Science Framework platform (https://osf.io/xvbef/).

Benchmarking evaluation measuresDiagnostic power in mock communities

To evaluate the effect of different decontamination tools in mock community datasets, we compared a tool’s classification into mock or contaminant reads with the true classification, given by the expected sequences per mock. We evaluated accuracy, sensitivity, specificity, precision, Matthews correlation coefficient, and Youden’s index.

Here, sensitivity (also called recall) is the ability of each algorithm to correctly remove a contaminant ASV, and specificity is the ability of each algorithm to correctly keep an expected mock ASV in the dataset. Accuracy measures the proportion of correct classifications among all classifications and is a commonly used evaluation index. However, it can give misleading results in case of class imbalance, i.e., when samples have very high or very low proportions of contaminants. Additionally, accuracy needs to be interpreted in relation to baseline accuracy, which corresponds to contaminant prevalence in our samples. The frequently recommended Matthews correlation coefficient (MCC) quantifies the correlation between the true classification and the tool’s classification and gives fairer results by incorporating information on both correct and incorrect classifications. MCC is also affected by class imbalance and is thus not comparable between studies or samples with different proportions of contaminants. Youden’s index (also called bookmaker informedness) incorporates sensitivity and specificity. It is not affected by class imbalance and is therefore suitable for all levels of contaminant prevalence and comparable between samples and datasets. Precision measures the proportion of correctly removed contaminant ASVs among all removed ASVs.

Sensitivity, specificity, accuracy, and precision range from 0 to 1, with 0.5 indicating a classification as good as random (0 in case of precision). Youden’s index and MCC range from − 1 to 1, with 0 indicating a classification as good as random, and negative values indicating a deterioration compared to the original composition due to reversed labels. For all evaluation measures, 1 indicates a perfect classification.

Evaluation of the skin microbiome dataset

We evaluated the sample composition of the 10 most abundant genera before and after applying the decontamination algorithms, as well as the reduction in reads and number of ASVs. Since the expected sequences are not known in the skin microbiome dataset, we instead evaluated the removed and kept genera per decontamination tool and filter setting. Therefore, lists of typical contaminants and typical skin inhabitants were created at the genus level according to a literature search (Additional file 1: Supplementary Table 3, 4 and 5). Typical contaminants were defined based on ten papers dealing with contaminants (Additional file 1: Supplementary Table 3) [15, 16, 18, 22, 23, 50,51,52,53,54]. To determine skin bacteria, seven papers were selected for the classification, representing a mix of skin microbiome-related original publications, reviews, sequencing studies, and cultivation studies (Additional file 1: Supplementary Table 4) [55,56,57,58,59,60,61]. The occurrences of each skin genus in the seven papers are summarized in Additional file 1: Supplementary Table 5.

Statistical analysis

Benchmarking analysis was performed in R 4.0.2 [46], and MicrobIEM analysis of the skin microbiome dataset was performed in version 0.7 (https://github.com/LuiseRauer/MicrobIEM). Analysis scripts are available on Github for benchmarking of the mock communities (https://github.com/LuiseRauer/Decontamination_benchmarking) and at the Open Science Framework (OSF) platform for benchmarking of the skin microbiome dataset (https://osf.io/yn9sa/).

Comments (0)

No login
gif