Several controlled vocabularies and ontologies that aren’t themselves OBO Foundry ontologies use unsanctioned OBO PURLs. This post is about how to use the Bioregistry to identify which resources are doing this and to give some insight into how we arrived in this situation.

Background on the OBO Foundry

The OBO Foundry is a set of independent, interoperable biomedical ontologies that aspire to using shared development principles. One such principle is to use a principled approach for creating persistent uniform resource locators (PURLs) for local unique identifiers in each ontology. These PURLs follow the form http://purl.obolibrary.org/obo/<PREFIX>_<LOCAL UNIQUE IDENTIFIER>. For example, a prefix might be GO (for the Gene Ontology) and local unique identifier might be 0032571 (for response to vitamin K in GO) resulting in the PURL http://purl.obolibrary.org/obo/GO_0032571.

While most semantic web resources allow the use of any IRIs (internationalized resource identifiers), the OBO Foundry enforces that its PURLs resolve to something useful for readers (e.g., to the Ontology Lookup Service). The resolver behind http://purl.obolibrary.org is implemented and maintained in a GitHub repository with corresponding .htaccess files for each OBO Foundry ontology. Correct and useful configuration for each ontology are a requirement for acceptance to the OBO Foundry.

At the core of the OBO Foundry are several high quality, well-known, generally useful ontologies such as the Gene Ontology and the Cell Ontology. Inclusion in the OBO Foundry has therefore become a de facto stamp of approval for ontologies that (until now) 254 ontologies have (for better or worse) successfully sought out.

The Masquerade

Unfortunately, some ontologies and controlled vocabularies have adopted OBO PURLs even though they are not OBO Foundry ontologies. This is a problem for a few reasons:

  1. The PURLs are not registered in https://github.com/OBOFoundry/purl.obolibrary.org/ and therefore do not resolve
  2. The quality of the ontology has not been vetted, meaning that low-quality ontologies using OBO PURLs could damage community trust in the OBO Foundry
  3. These PURLs might conflict with other URIs prefix synonyms, increasing complexity for data integration
  4. Most importantly: it’s confusing.

One of the jobs of the Bioregistry is to catalog the URI format strings for identifier resources useful for the life and natural sciences. This allows us to assess how big the problem of non-OBO Foundry ontologies are using OBO PURLs, and why. Without further ado, here’s the list of offending resources that appear in the Bioregistry:

prefix name evidence uri_prefix
aeon Academic Event Ontology curated http://purl.obolibrary.org/obo/AEON_
cemo COVID-19 epidemiology and monitoring ontology extra http://purl.obolibrary.org/obo/cemo.owl#
covoc CoVoc Coronavirus Vocabulary curated http://purl.obolibrary.org/obo/COVOC_
decipher DECIPHER CNV Syndromes biocontext http://purl.obolibrary.org/obo/DECIPHER_
dermo Human Dermatological Disease Ontology curated http://purl.obolibrary.org/obo/DERMO_
efo Experimental Factor Ontology biocontext http://purl.obolibrary.org/obo/EFO_
gorel GO Relations biolink http://purl.obolibrary.org/obo/GOREL_
hpath Histopathology Ontology curated http://purl.obolibrary.org/obo/MC_
idocovid19 COVID-19 Infectious Disease Ontology curated http://purl.obolibrary.org/obo/COVIDO_
lbo Livestock Breed Ontology curated http://purl.obolibrary.org/obo/LBO_
lpt Livestock Product Trait Ontology curated http://purl.obolibrary.org/obo/LPT_
mesh Medical Subject Headings biocontext http://purl.obolibrary.org/obo/MESH_
msio Metabolomics Standards Initiative Ontology curated http://purl.obolibrary.org/obo/MSIO_
omia Online Mendelian Inheritance in Animals biocontext http://purl.obolibrary.org/obo/OMIA_
omim Online Mendelian Inheritance in Man biocontext http://purl.obolibrary.org/obo/OMIM_
pride PRIDE Controlled Vocabulary curated http://purl.obolibrary.org/obo/PRIDE_
reo Reagent Ontology curated http://purl.obolibrary.org/obo/REO_
roleo Role Ontology curated http://purl.obolibrary.org/obo/RoleO_
soybase SoyBase prefixcommons http://purl.obolibrary.org/obo/
uniprot.isoform UniProt Isoform extra http://purl.obolibrary.org/obo/UniProtKB_
vido Virus Infectious Disease Ontology curated http://purl.obolibrary.org/obo/VIDO_
vsmo Ontology for vector surveillance and management curated http://purl.obolibrary.org/obo/VSMO_
xl Cross-linker reagents ontology curated http://purl.obolibrary.org/obo/XL_

In the evidence column, there are a few possible entries:

  1. curated - this is the URI prefix manually curated in the Bioregistry. This happens when the primary ontology artifact uses OBO PURLs.
  2. extra - this is when there’s a manually curated extra URI prefix in the Bioregistry (in addition to the primary one) that uses an OBO PURL. This usually is done to enable the Bioregistry’s IRI parser to handle cases that appear in third-party data that incorrectly constructs IRIs.
  3. biocontext, biolink, and prefixcommons - this is when other registries have assigned OBO PURLs as their URI expansions

It’s worth noting that there are probably lots more resources doing this, e.g., that are listed in BioPortal, but have not been included in the Bioregistry because of their lack of notability, utility, or reuse.

Here's the code that generated the table (before minor modifications) ```python import bioregistry from tabulate import tabulate OBOLIBRARY_SUBSTRING = "purl.obolibrary.org/obo" rows = [] for prefix, resource in bioregistry.read_registry().items(): if resource.get_obofoundry_prefix() or prefix == "obo": continue name = resource.get_name() contact = resource.get_contact() l = f"[{prefix}](https://bioregistry.io/{prefix})" if resource.uri_format and OBOLIBRARY_SUBSTRING in resource.uri_format: rows.append((l, name, "curated", "", resource.uri_format)) continue elif (uri_format := resource.get_uri_format()) and OBOLIBRARY_SUBSTRING in uri_format: rows.append((l, name, "default", "", uri_format)) continue for metaprefix in resource.get_mappings(): uri_format = (getattr(resource, metaprefix, None) or {}).get("uri_format") if uri_format and OBOLIBRARY_SUBSTRING in uri_format: rows.append((l, name, "mapped", metaprefix, uri_format)) for p in resource.get_extra_providers(): if OBOLIBRARY_SUBSTRING in p.uri_format: rows.append((l, name, "extra", p.code, p.uri_format)) print(tabulate(rows, headers=["prefix", "name", "type", "code", "uri_format"], tablefmt="github")) ```

Rationalizations

Based on the table above, there are several situations in which an OBO PURL appears:

  1. Ontologies that are curated in the OBO flat file format then converted to the OWL format using ROBOT are automatically given OBO PURLs. For example, this occurs for the Livestock Breeding Ontology.
  2. Similarly, ontologies that are uploaded to BioPortal likely undergo a similar procedure that results in BioPortal PURLs that themselves include OBO PURLs. For example, this occurs for FamPlex and the Vital Sign Ontology.
  3. Ontologies are/were intended to be submitted to the OBO Foundry. For example, this includes:
    • The Academic Event Ontology (AEON) is still a work in progress and will be submitted to the OBO Foundry
    • The Reagent Ontology (REO) was abandoned and never submitted to the OBO Foundry
  4. Ontologies that closely used by the OBO Foundry ecosystem, and are somtimes mistaken for being in it (e.g., EFO)
  5. Not all ontologies, controlled vocabularies, or other semantic spaces have associated PURLs. Several registries (e.g., Prefix Commons, BioContext, BioLink Model) have “made up” OBO PURLs for non-OBO Foundry resources because of their use case-specific preferences.
  6. In the case of XL, I think that this was an OBO Foundry ontology at some point but got renamed. It’s very difficult to understand the history of the HUPO Proteomics Standards Initiative from the outside
  7. Some of these examples that appear with evidence “extra” are there because third-party resources incorrectly reference entities using unsanctioned OBO PURLs
  8. In the case of CEMO, it appears the URI prefix is an artifact of incorrect configuration curation tooling (likely Protege).

It’s hard to know for sure for the situation that lead to the developers/maintainers of primary resources using unsanctioned OBO PURLs or the developers/maintainers of third party resources using unsanctioned OBO PURLs. Regardless, it’s still valuable for the community to know about these problems and potentially use comprehensive resources like the Bioregistry as a guide towards improving interoperability and interpretability.