Several controlled vocabularies and ontologies that aren’t themselves OBO Foundry ontologies use unsanctioned OBO PURLs. This post is about how to use the Bioregistry to identify which resources are doing this and to give some insight into how we arrived in this situation.

Background on the OBO Foundry

The OBO Foundry is a set of independent, interoperable biomedical ontologies that aspire to using shared development principles. One such principle is to use a principled approach for creating persistent uniform resource locators (PURLs) for local unique identifiers in each ontology. These PURLs follow the form<PREFIX>_<LOCAL UNIQUE IDENTIFIER>. For example, a prefix might be GO (for the Gene Ontology) and local unique identifier might be 0032571 (for response to vitamin K in GO) resulting in the PURL

While most semantic web resources allow the use of any IRIs (internationalized resource identifiers), the OBO Foundry enforces that its PURLs resolve to something useful for readers (e.g., to the Ontology Lookup Service). The resolver behind is implemented and maintained in a GitHub repository with corresponding .htaccess files for each OBO Foundry ontology. Correct and useful configuration for each ontology are a requirement for acceptance to the OBO Foundry.

At the core of the OBO Foundry are several high quality, well-known, generally useful ontologies such as the Gene Ontology and the Cell Ontology. Inclusion in the OBO Foundry has therefore become a de facto stamp of approval for ontologies that (until now) 254 ontologies have (for better or worse) successfully sought out.

The Masquerade

Unfortunately, some ontologies and controlled vocabularies have adopted OBO PURLs even though they are not OBO Foundry ontologies. This is a problem for a few reasons:

  1. The PURLs are not registered in and therefore do not resolve
  2. The quality of the ontology has not been vetted, meaning that low-quality ontologies using OBO PURLs could damage community trust in the OBO Foundry
  3. These PURLs might conflict with other URIs prefix synonyms, increasing complexity for data integration
  4. Most importantly: it’s confusing.

One of the jobs of the Bioregistry is to catalog the URI format strings for identifier resources useful for the life and natural sciences. This allows us to assess how big the problem of non-OBO Foundry ontologies are using OBO PURLs, and why. Without further ado, here’s the list of offending resources that appear in the Bioregistry:

prefix name evidence uri_prefix
aeon Academic Event Ontology curated
cemo COVID-19 epidemiology and monitoring ontology extra
covoc CoVoc Coronavirus Vocabulary curated
decipher DECIPHER CNV Syndromes biocontext
dermo Human Dermatological Disease Ontology curated
efo Experimental Factor Ontology biocontext
gorel GO Relations biolink
hpath Histopathology Ontology curated
idocovid19 COVID-19 Infectious Disease Ontology curated
lbo Livestock Breed Ontology curated
lpt Livestock Product Trait Ontology curated
mesh Medical Subject Headings biocontext
msio Metabolomics Standards Initiative Ontology curated
omia Online Mendelian Inheritance in Animals biocontext
omim Online Mendelian Inheritance in Man biocontext
pride PRIDE Controlled Vocabulary curated
reo Reagent Ontology curated
roleo Role Ontology curated
soybase SoyBase prefixcommons
uniprot.isoform UniProt Isoform extra
vido Virus Infectious Disease Ontology curated
vsmo Ontology for vector surveillance and management curated
xl Cross-linker reagents ontology curated

In the evidence column, there are a few possible entries:

  1. curated - this is the URI prefix manually curated in the Bioregistry. This happens when the primary ontology artifact uses OBO PURLs.
  2. extra - this is when there’s a manually curated extra URI prefix in the Bioregistry (in addition to the primary one) that uses an OBO PURL. This usually is done to enable the Bioregistry’s IRI parser to handle cases that appear in third-party data that incorrectly constructs IRIs.
  3. biocontext, biolink, and prefixcommons - this is when other registries have assigned OBO PURLs as their URI expansions

It’s worth noting that there are probably lots more resources doing this, e.g., that are listed in BioPortal, but have not been included in the Bioregistry because of their lack of notability, utility, or reuse.

Here's the code that generated the table (before minor modifications) ```python import bioregistry from tabulate import tabulate OBOLIBRARY_SUBSTRING = "" rows = [] for prefix, resource in bioregistry.read_registry().items(): if resource.get_obofoundry_prefix() or prefix == "obo": continue name = resource.get_name() contact = resource.get_contact() l = f"[{prefix}]({prefix})" if resource.uri_format and OBOLIBRARY_SUBSTRING in resource.uri_format: rows.append((l, name, "curated", "", resource.uri_format)) continue elif (uri_format := resource.get_uri_format()) and OBOLIBRARY_SUBSTRING in uri_format: rows.append((l, name, "default", "", uri_format)) continue for metaprefix in resource.get_mappings(): uri_format = (getattr(resource, metaprefix, None) or {}).get("uri_format") if uri_format and OBOLIBRARY_SUBSTRING in uri_format: rows.append((l, name, "mapped", metaprefix, uri_format)) for p in resource.get_extra_providers(): if OBOLIBRARY_SUBSTRING in p.uri_format: rows.append((l, name, "extra", p.code, p.uri_format)) print(tabulate(rows, headers=["prefix", "name", "type", "code", "uri_format"], tablefmt="github")) ```


Based on the table above, there are several situations in which an OBO PURL appears:

  1. Ontologies that are curated in the OBO flat file format then converted to the OWL format using ROBOT are automatically given OBO PURLs. For example, this occurs for the Livestock Breeding Ontology.
  2. Similarly, ontologies that are uploaded to BioPortal likely undergo a similar procedure that results in BioPortal PURLs that themselves include OBO PURLs. For example, this occurs for FamPlex and the Vital Sign Ontology.
  3. Ontologies are/were intended to be submitted to the OBO Foundry. For example, this includes:
    • The Academic Event Ontology (AEON) is still a work in progress and will be submitted to the OBO Foundry
    • The Reagent Ontology (REO) was abandoned and never submitted to the OBO Foundry
  4. Ontologies that closely used by the OBO Foundry ecosystem, and are somtimes mistaken for being in it (e.g., EFO)
  5. Not all ontologies, controlled vocabularies, or other semantic spaces have associated PURLs. Several registries (e.g., Prefix Commons, BioContext, BioLink Model) have “made up” OBO PURLs for non-OBO Foundry resources because of their use case-specific preferences.
  6. In the case of XL, I think that this was an OBO Foundry ontology at some point but got renamed. It’s very difficult to understand the history of the HUPO Proteomics Standards Initiative from the outside
  7. Some of these examples that appear with evidence “extra” are there because third-party resources incorrectly reference entities using unsanctioned OBO PURLs
  8. In the case of CEMO, it appears the URI prefix is an artifact of incorrect configuration curation tooling (likely Protege).

It’s hard to know for sure for the situation that lead to the developers/maintainers of primary resources using unsanctioned OBO PURLs or the developers/maintainers of third party resources using unsanctioned OBO PURLs. Regardless, it’s still valuable for the community to know about these problems and potentially use comprehensive resources like the Bioregistry as a guide towards improving interoperability and interpretability.