The Virus Infectious Disease Ontology (VIDO) serves as an extension of the Infectious Disease Ontology (IDO) Core, designed to represent the entities, processes, and relationships specific to viral infectious diseases. Its scope encompasses the detailed characterization of viruses, including their taxonomy, genetic composition, and replication mechanisms. It also addresses viral diseases, their clinical manifestations, and the biological processes involved in host-virus interactions. It consists of 36 unique classes and—leveraging existing resources where possible—reuses 400 existing ontology terms and 43 object properties from other ontologies. VIDO was developed based on virology literature [37,38,39,40,41]; subject-matter experts were consulted regarding definitions of key terms, such as the definitions for virus type terms that were minted for VIDO.
Major classesTable 2 presents major VIDO classes. VIDO’s central class is virus which is imported from the NCBITaxon and relabeled from “Viruses” to reflect the OBO convention that class names not be pluralized. Viruses are classified firstly in terms of their material structure, following the Baltimore Classification [37], where virus species are categorized into seven groups based on what steps members of a virus species must take during the viral replication cycle [38,39,40,41]. For example, a positive-sense single-stranded RNA virus, such as SARS-CoV-2, has a genome which can be immediately translated into viral proteins upon entry into a cell, while a double-stranded DNA virus, such as Orthopoxvirus variola, must undergo transcription into messenger RNA before translation into viral proteins can proceed. VIDO reuses classes from GO reflecting the viral capsid composition and virus tropism [25], as well as terms from PRO to represent viral proteins [26], ChEBI for terms such as DNA and RNA [31, 32], and so on.
Table 2 Major VIDO classesThough viruses are classified firstly in terms of their material structure, owing to their infectious nature, subclasses are secondarily classified as instances of the IDO class infectious structure, which are disposed to transmit to and become part of an infection in some host. This is captured in the OWL assertion that infectious structure is equivalent to:
The class acellular structure is a subclass of material entity – a broad BFO class that includes any entity that has some matter as part – instances of which do not have any parts that are cells. To illustrate infectious disposition: consider that SARS-CoV-2 virions are disposed to be transmitted to hosts via respiration, localize within host cells, leading to infection and disorder. In this respect, they are said to bear some infectious disposition. This highlights the fact that viruses are evolutionarily selected to infect hosts. Moreover, by adopting the distinction between infectious structure from infectious disposition, we distinguish the infectious potential of viruses from the underlying material basis that grounds that potential. Additionally, we disambiguate the infectious disposition of a given virion from its contribution to any associated viral disease, which in turn grounds the viral disease course that is the manifestation of disease in a host. More concretely, SARS-CoV-2 being infectious is in part explained by its being a positive-sense single stranded RNA virus; a host developing the disease COVID-19 is partially explained by SARS-CoV-2 being infectious; a host manifesting various symptoms following infection is partially explained by their having COVID-19.
Infectious disposition provides a link to pathogens more generally, as it is a subclass of pathogenic disposition – borne by entities to localize in and form disorders within a host. Notice that an infectious disposition is understood as the disposition of a virus bearer to be transmitted into a host and to initiate replication. This is not to be confused with an infectious disease, which is the disposition of an affected organism to undergo pathological processes due to a disorder. The class pathogen on the other hand is asserted to be equivalent to:
From which it follows that infectious structure is a subclass of pathogen, as displayed in Fig. 2.
Additionally, subclasses of virus such as bacteriophage – viruses which infect and replicate within or on bacteria – are constrained by the axiom:
And since virus is a subclass of acellular structure, it follows that bacteriophage is an inferred subclass of infectious structure and pathogen, just as one would expect. These examples reflect the general design strategy underwriting VIDO: from the material structure and infectious dispositions of viruses, one can infer plausible alternative taxonomic virus classifications (Fig. 2).
Fig. 2Protégé display of a portion of asserted and inferred VIDO hierarchies
A core design pattern of VIDO is the representation of the virus replication cycle [38, 42], which encompasses the sequence of stages viruses propagate through to produce new virions within a host organism. The virus replication cycle covers crucial aspects of overall viral pathogenesis, understood as involving virus transmission, localization, establishment of infection, and the appearance of a viral infectious disorder. This cycle begins with the attachment of the virus to specific receptors on the host cell surface, followed by entry into the host cell through processes such as membrane fusion or endocytosis. Following penetration into a cell, a virion initiates replication, which varies considerably based on the type of virus, as characterized in the Baltimore Classification. Ultimately, virus replication must proceed to messenger RNA, which is in turn used to synthesize viral proteins, then assembled into virions and released from the host cell.
Figure 3 displays how the virus replication life cycle is represented in VIDO. Any instance of virus replication stage is necessarily part of some instance of virus replication process, as illustrated with the red arrows reflecting the part of relationship reused from RO. Similarly, any instance of virus attachment stage precedes any instance of the other stages, as shown by the yellow arrows reflecting the transitive preceded by relation. Virus transcription generally precedes virus translation, with the exception that positive-sense RNA virus genomes act directly as messenger RNA for immediate translation into viral proteins [43]. The Protégé editor from which the figure was generated relies on OWL for diagram generation, but OWL is not amenable to representing conditional scenarios. While SPARQL is useful for querying RDF, the Protégé SPARQL plugin does not support inserting inferred triples to the graph. For both, we leverage SWRL [44] rules where “^” reflects logical conjunction and “?” is prepended to variables:
Fig. 3Virus replication in VIDO
virion(?x) ^ pssRNA(?x) ^ virusReplication(?p) ^ virusSynthesisStage(?y) ^ partOf(?y,?p) ^ virusTranslationStage(?z) ^ partOf(?z,?p) ^ participatesIn(?x,?y) ^ participatesIn(?x,?z) -> precededBy(?y,?z)
virion(?x) ^ dsDNA(?x) ^ virusReplication(?p) ^ virusTranslationStage(?y) ^
partOf(?y,?p) ^ virusTranscriptionStage(?z) ^ partOf(?z,?p) ^ participatesIn(?x,?y) ^ participatesIn(?x,?z) -> precededBy(?y,?z)
In words, if a virion that is an instance of the class positive-sense single-stranded RNA virus and participates in synthesis and translation stages of the same replication cycle, then the preceded by relation will be asserted to hold between the synthesis and translation stages. Similarly, if a virion that is an instance of the class double-stranded DNA virus and participates in translation and transcription stages of the same replication cycle, then the preceded by relation will be asserted to hold between the translation and transcription stages. In this way, VIDO respects the temporal orderings among stages in the virus replication cycle without falsely asserting that virus transcription in every case precedes virus translation.
Unambiguous representation of virus replication has clear value for development of intervention and treatment strategies in the domain of rational drug design. For example, only host cells with specific features are susceptible to attachment by SARS-CoV-2 [3]. In humans, the standard route for successful infection involves virion attachment to host alveolar epithelial cells through angiotensin-converting enzyme 2 (ACE2) receptors [45, 46], defined in the PRO [26]. These cells may be characterized as bearing an adhesion disposition, defined in IDO as macromolecules disposed to participate in adhesion processes. Following the virus replication cycle, cell penetration often follows attachment, where host cell cleavage is aided by trans-membrane protease serine 2 (TMPRSS2), before SARS-CoV-2 cell membrane fusion [47]. An ontological characterization of the SARS-CoV-2 replication cycle allows for inferences about possible targets for drugs and interventions designed to disrupt that cycle. Such disruptions can, moreover, be represented ontologically by way of extensions of the GO class negative regulation of biological process—any process that stops, prevents, or reduces the frequency, rate or extent of a biological process—such as negative regulation of viral life cycle. Extended to virus replication stages, we generate negative regulation of virus attachment, and so on for each stage.
ExtensionsThe modular design of VIDO supports its extension into more specific virus domains, ensuring its continued relevance as new challenges in virology emerge. The Coronavirus Infectious Disease Ontology (CIDO) is perhaps the most widely-used extension, having been leveraged in the search for coronavirus interventions, treatment exploration, and basic research [5]. Since the development of VIDO, the VIDO and CIDO teams have worked to bring the latter under the guardrails of the former [16]. Additional extensions of VIDO have emerged since its development, such as the IoT-MIDO project [48] which leverages infectious disease vocabularies based on IDO to explore patient monitoring and risk assessment, clinical management of patients with infectious diseases, as well as epidemic risk analysis and surveillance. From another direction, the Covid19-IBO project aims to integrate virus-specific vocabularies from several ontologies containing terms related to COVID-19 in the interest of gaining insights into the impact this disease had on the banking sector in India; the authors provide a schema to identify similarities and differences across these virus ontologies [49].
Looking ahead, there are several virus-specific extensions of IDO which are within scope of alignment to VIDO. Specifically, the HIV Ontology (IDOHIV) [12], the Influenza Ontology (IDOFLU) [7], and the Dengue Fever Ontology (IDODEN) [13]—none of which have been updated since 2017 [3, 11]—should be brought into alignment with VIDO. Doing so will have the benefit of aligning each to the updated IDO, promoting semantic interoperability among IDO extensions, and encouraging further development of virus-specific ontologies within this ecosystem.
The bacteria infectious disease ontologyThe Bacteria Infectious Disease Ontology (BIDO) extension of IDO is designed to serve as a reference ontology for bacterial pathogens and bacterial pathogenesis more generally. Its scope encompasses the detailed characterization of bacteria, including their taxonomy, genetic composition, and reproduction mechanisms. It also addresses bacterial diseases, their clinical manifestations, and the biological processes involved in host-bacteria interactions. It consists of 37 unique classes and—leveraging existing resources where possible—reuses 1400 existing ontology terms and 55 object properties from other ontology projects. BIDO was developed based on bacteria literature [50,51,52]; subject-matter experts were consulted regarding definitions of key terms, such as the definitions for bacteria type terms that were newly created for BIDO.
Major classes and relationsBIDO introduces—as subclasses of the undefined NCIBTaxon term bacterium (relabeled from “Bacteria” to reflect the OBO convention that classes have singular names)—new terms for a variety of bacterium types. Figure 4 provides example subclasses, such as bacilli bacterium, spirilla bacterium, cocci bacterium, and spirochetes bacterium, which reflect bacteria classifications determined by cellular morphology.
Fig. 4Bacteria type vocabulary in BIDO
BIDO provides rich semantics around these terms by importing and additional terms broadly relevant to the bacterial domain from several OBO Foundry ontologies, such as IDO, ChEBI, PRO, and GO. From ChEBI, molecular entity terms are imported that factor in bacterial pathogenesis, such as lipopolysaccharide, capsular polysaccharide, and bacteriocin. PRO provided a source for various bacterial protein terms while GO imports included bacterial cell components such as Gram-positive-bacterium-type cell wall, pilus, capsule, slime layer, and so on, as well as important, cellular process terms such as entry of bacterium into host cell, aerobic respiration, fermentation, and spore dispersal, among others. Additionally, the Ontology of Microbial Phenotypes (OMP) [53] provided phenotype terms characteristic of different bacterium types, such as obligate anaerobe, obligate aerobe, facultative anaerobe, and so on, while the Clinical Measurement Ontology (CMO) [54] provided measurement data terms relating to bacterial infection, such as bacterial infection severity measurement, bacterium count, and bacterial infection severity score. As illustrated in Fig. 5, BIDO includes various logical axioms connecting bacterial pathogenicity to key such imported terms, as well as terms from IDO such as toxin, virulence factor, adhesion factor, and invasion factor. Imported terms are used to provide logical definitions for various bacterium types such as aerobic bacterium, anaerobic bacterium, spirilla bacterium, discussed in more detail below.
Fig. 5Semantic enrichment of existing bacteria vocabulary in BIDO
BIDO’s major classes extend directly from terms in either IDO or OGMS, following the “hub-and-spoke” methodology employed in VIDO development [16]. BIDO key terms and definitions are displayed in Table 3.
Table 3 Major BIDO ClassesBIDO distinguishes between bacterial toxin disorder and bacterial infectious disorder. Instances of the latter are in every case instances of bacterial infection and serve as the material basis for bacterial infectious disease. In contrast, the class bacterial toxin disorder covers any disorder involving bacterial toxins, whether associated with an infection by the relevant toxin-producing bacteria or not.Footnote 3 An example to illustrate this distinction concerns the causative agent in food botulism, Clostridium botulinum, a bacterial pathogen that is not infectious. When a person consumes food contaminated by toxins produced by C. botulinum, the ingested toxins often cause disorder in that person, leading to the disease. The disorder is not, however, a bacterial infectious disorder, since the bacterium is not disposed to invade or be transmitted to other potential hosts [3, 16]. Of course, some bacterial toxin disorders are instances of infectious disorder as when toxins secreted by bacteria serve as virulence factors aiding infection of a host by, say, helping bacteria bypass innate and adaptive immune responses.
Similarly, some pathogenic bacteria may be infectious while others are not. For example, Yersinia pestis and Vibrio cholerae are disposed to be transmitted and become part of some infection and so be counted as infectious agents in the creation of bacterial infectious disorders. Even so, bacteria need not be infectious to count as pathogenic; they need only exhibit virulence. That is, any instance of a bacterium which bears a virulence quality counts as an instance of a pathogen. As pathogens, such bacteria bear instances of pathogenic disposition, and are consequently disposed to establish localization in or produce toxins transmittable to a host to form disorder.
Modeled on VIDO’s viral pathogenesis, BIDO introduces bacterial pathogenesis, which has the subclass axiom:
This axiom leverages the GO term toxin biosynthetic process, characterizing the formation of toxins by cells or organisms, including bacteria that cause disease in various organisms. Unlike viral pathogenesis, however, we cannot assert that instances of bacterial pathogenesis in every case involve an establishment of localization in host or a process of establishing an infection, as a part. As discussed above there are examples of bacterial pathogenesis in which bacterial toxins cause disease in a host but are produced by bacteria that never localize in the diseased organism. Additionally, BIDO allows that bacterial pathogenesis may involve infections caused by opportunistic pathogens that do not engage in any pathogen transmission process, such as when bacteria colonizing the surface of a host’s skin realize instances of infectious disposition when that protective barrier is ruptured.
BIDO includes newly added axioms, connecting various imported terms to relevant IDO terms. To illustrate, consider the GO term pilus. Pili are the hair-like appendages found on many bacteria which help facilitate the adhesion of pathogenic strains of bacteria to bodily tissues. Pili thereby increase bacteria replication rates, allow bacteria to colonize host cells, and facilitate tissue infection. Pili thus contribute to the virulence of many pathogenic bacteria. BIDO represents this complexity around the term pilus with the following equivalency axiom:
As logically defined, pilus is an inferred subclass of IDO’s adhesion factor and virulence factor.
Other examples of bacterial virulence factors are represented by terms imported from ChEBI. A capsular polysaccharide is a polysaccharide capsule found on the cell surface of many bacteria that enables both their adhesion to surfaces and their evasion of host immune responses, as well as providing protection from toxins. Lipid A is the lipopolysaccharide portion of the outer membrane of gram-negative bacteria responsible for its endotoxicity. As with pilus, we assert that both capsular polysaccharide and lipid A have a virulence factor disposition, so both are inferred subclasses of IDO’s virulence factor. In the case of lipid A, instances have some endotoxin disposition, and so it is an inferred subclass of endotoxin. As development of BIDO continues, we will add similar axioms to other imported terms from the GO, PRO, and ChEBI.
ExtensionsThere are currently four existing bacterial infectious disease ontologies extending from IDO which would stand to benefit from refactoring as domain ontology extensions of BIDO:
1.Staphylococcus aureus Infectious Disease Ontology (IDO-Staph) [55]
2.The Brucellosis Infectious Disease Ontology (IDOBRU) [56]
3.The Meningitis Infectious Disease Ontology (IDOMEN) [57]
4.The Bacterial Clinical Infectious Disease Ontology (BCIDO) [53, 54]
Each requires updating to the most recent versions of IDO and BFO, during which refactoring to support BIDO as a common reference ontology can be conducted. This seems particularly impactful for BCIDO, designed to represent within its scope bacterial infections, bacteria, and antibiotic treatment, excluding those associated with Mycobacteria. Presentations of BCIDO [54] suggest there are many ontology terms that would be included in a bacterial infectious disease reference ontology but also suggest that BCIDO is not particularly modular. We maintain that it would thus be in the interest of modularity and reuse, that BIDO and BCIDO communities work towards a division of labor in this domain.
Given the scope of BIDO as a reference ontology, additional potential extensions cover the broad domain of bacterial infectious diseases, and may include ontologies for tuberculosis, infective endocarditis, chlamydia, E. coli infection, and so on.
The mycosis infectious disease ontologyThe Mycosis Infectious Disease Ontology (MIDO) is an open-source biomedical ontology built with the purpose of providing standardized ontological representations specific to fungal infectious diseases. Its scope encompasses detailed characterization of fungi, including their taxonomy, genetic composition, and reproduction mechanisms. It also addresses fungal diseases, their clinical manifestations, and the processes involved in host-fungus interactions. MIDO consists of 71 unique classes and—leveraging existing resources where possible—reuses 526 existing ontology terms and 39 object properties from other ontology projects. MIDO aims to bring fungal infectious diseases in conversation with other OBO Foundry ontologies related to infectious diseases. It was developed based on fungi literature [44, 59] and engagement with subject-matter experts in the UNITE community [60] which maintains an open-source database for fungal taxonomy and nomenclature.
Major classes and relationsFungi are a diverse set of eukaryotic organisms that play an important role in our ecosystem. Table 4 highlights key classes and definitions introduced by MIDO, which extends the FOODON [33] term fungus by introducing subclasses for yeast, mold, and dimorphic fungus. Moreover, terms for the anatomical structure of fungi are imported from the OBO Foundry Fungal Gross Anatomy Ontology (FAO) [61], such as the general class fungal structure, and subclasses such as hyphal tip, sterigma, and mycorrhiza.
Table 4 Major MIDO classesCertain fungus types can cause hosts to develop fungal infections—captured by the MONDO Disease Ontology (MONDO) [58] term mycosis—which vary in severity owing to the host tissues they affect. MIDO reuses other terms from MONDO relevant to various mycoses, such as the commonly drawn fourfold distinction between superficial mycosis, cutaneous mycosis, subcutaneous mycosis, and systemic mycosis. An example of systemic mycosis occurs through the inhalation of fungal spores, such as those produced by Histoplasma capsulatum. The MONDO term histoplasmosisFootnote 4 is reused here to represent the associated MIDO fungal infectious disease. More specifically, inhalation of Histoplasma capsulatum spores above a certain threshold may result in the formation of fungal infectious disorder in a host, which serves as the physical basis for histoplasmosis. This fungal infectious disease may then be realized in a histoplasmosis infectious disease course, often involving a variety of symptoms such as chills, fevers, headaches, and so on. This basic design pattern inherited from IDO is generalizable to other fungal infectious diseases, such as when an instance of aspergillus has colonized and grows within a host’s lungs or sinuses, forming an aspergilloma—a fungal infectious disorder—which is the physical basis for MONDO’s aspergillosis—a fungal infectious disease—that may be realized in an aspergillosis infectious disease course.
Following IDO, MIDO adopts that “pathogen” should be understood as indexed to specific species and maturity of potential hosts. For example, Ceratocystis paradoxa is a fungal plant pathogen responsible for significant annual loss of pineapple harvests [62], but which is not considered pathogenic to humans. This host- and context-sensitivity also underlies our treatment of opportunistic pathogens, which is particularly important for MIDO since many fungal infections are opportunistic. Following IDO, we maintain that opportunistic pathogens are not simply microbes that become pathogenic by virtue of an “opportunity”; rather, they are microbes that already bear a pathogenic disposition, which may remain latent under normal conditions and realized in others. This is reflected in opportunistic fungal pathogen, which is constrained by the logical axiom:
inheres in some fungus and (realized in only (process and has part some establishment of localization in host and has part some transmission process and has part some appearance of disorder
and not occurs in some immunocompetent organism))
This axiom indicates that any instance of opportunistic fungal pathogen is disposed to establish localization in a host that is not immunocompetent, following a transmission process, from which a disorder emerges. A common example of an opportunistic fungal pathogen is Candida albicans, a yeast that is commonly found in the human gut flora and often associated with thrush and gastrointestinal infections in immunocompromised human hosts. The alignment of fungal pathogen terminology with that of other infectious diseases, thus allows for exploration of secondary fungal infections. Of value is the identification of patients exhibiting weakened immune systems who may be thereby susceptible to fungal infections.
From another angle, fungal spores can provoke pathogenic responses that may, in clinical presentation, be difficult to distinguish from allergic reactions such as those induced by pollen inhalation. MIDO can serve as a framework for semantically distinguishing allergic responses from infectious pathogenic responses. Following IDO, allergic responses may be modeled as realizations of hypersensitivity dispositions involving material entities that bear allergen roles which participate in some process of hypersensitivity. In contrast, pathogen responses are modeled as realizations of infectious dispositions borne by pathogens that initiate the formation of disorders which are in turn the physical bases of. When encoded in OWL, associated reasoners can distinguish between allergic and infectious etiologies for otherwise similar manifestations.
Diagnosing fungal infections is inherently challenging due to the diverse and complex nature of fungi [63]. Compounding these diagnostic challenges is an even more pressing issue: the growing resistance of fungi to antifungal treatments. Despite the identification of antifungal drugs that seem to maintain efficacy after decades of use as treatments, such as Amphotericin B instances of the MIDO class polyene, many antifungal drugs, including those represented in MIDO by the azole class, have shown decreasing efficacy worldwide due to the emergence of antifungal resistance inhering in fungal strains, such as Aspergillus fumigatus [64, 65]. MIDO offers an opportunity to address such challenges by providing well-defined representations of the mechanisms of antifungal resistance and the genetic factors contributing to it. Such enhancements may facilitate the study of resistance patterns, aiding the design of more effective treatment plans and the development of next-generation antifungal drugs [66]. Additionally, integrating MIDO with drug discovery databases will enable the identification of novel antifungal compounds. By connecting fungal genes and pathways with existing drug targets, researchers can leverage these links to develop new therapeutic options with greater precision and efficiency [67].
ExtensionsGiven how recent initiation of MIDO has been, there are currently no ontologies extending from it. Moreover, given pathogen research heavily favoring bacterial and viral pathogens, few OBO ontologies contain fungal infection classes or relations. Nevertheless, there are numerous reasons to take such modeling seriously, not the least of which owes to the considerable amount of data being curated by organizations such as UNITE, which run the risk of creating data silos. MIDO developers envision leveraging this ontology to provide a semantic layer connecting data from the UNITE community to other datasets concerning fungi.
Additionally, MIDO developers envision using MIDO to model incidence and spread of fungal infections, including rare and emerging mycoses, in the interest of supporting outbreak monitoring and response. Ontological characterizations of the complex interrelations of fungal pathogens and their hosts will, moreover, clarify understanding o
Comments (0)