There are a lot of terms that I’ve been throwing around when talking about the Bioregistry, so this blog post is a first draft of a gloassary of all of them.

Later, I will revise this further and put it either on the Bioregistry website, or make a totally new repo on the Biopragmatics GitHub organization.

## Semantic spaces

While a controlled vocabulary enumerates a set of named entities, a semantic space enumerates a set of stable local identifies for entities. Most high-quality controlled vocabularies also assign local identifiers for their named entities and are also semantic spaces. For example, the Chemical Entities of Biological Interest (ChEBI) is a well-known ontology in the biomedical domain that is both a controlled vocabulary and a semantic space.

The term local identifier is synonymous with identifier and accession, but has the added qualifier local as a reminder that two semantic spaces may use the same one. For example, the Chemical Entities of Biological Interest (ChEBI) entry for 6-methoxy-2-octaprenyl-1,4-benzoquinone and the Human Disease Ontology (DOID) entry for gender identity disorder share the local identifier of 1234.

### Formalizing local identifiers

It’s often useful to have a regular expression that describes local identifiers of a given semantic space. For example, both ChEBI and DOID use local identifiers that look like numbers, which match the regular expression ^\d+$. The ^ and $ denote the beginning and end of the regular expression and appear exactly the same in all regular expressions for local identifiers. The \d will match a number and the + means that the preceding token (\d) can be matched one or more times in a row.

### Open Biomedical Ontologies CURIEs

The Open Biomedical Ontologies (OBO) Foundry provides a persistent URL service (PURL) to create stable URIs for biomedical entities curated in their ontologies (e.g., Human Disease Ontology, Phenotype And Trait Ontology). They have four parts:

1. A URI prefix (in red; always the same)
2. An ontology prefix (in orange)
3. A delimiter (in black; always the same)
4. An ontology local identifier (in blue)

http://purl.obolibrary.org/obo/DRON_0000005

Confusingly, the entire combination of the ontology’s prefix, the delimiter, and the ontology’s local identifier (e.g., DRON_0000005) are considered in some contexts as a local identifier in a theoretical semantic space for OBO, whose URI prefix is http://purl.obolibrary.org/obo/. This confusion lead to services like Identifiers.org to denote these ontologies as having the “namespace embedded in the local unique identifier” and therefore include the prefix again in the regular expression pattern describing the local identifiers, e.g. ^DOID:\d+\$ for the Human Disease Ontology.

This notation of the regular expression makes no sense for several reasons:

1. The regular expression should correspond to the local identifiers of a semantic space like DOID, not a registry like the OBO PURL system.
2. If you follow the simple algorithm for constructing a CURIE from a prefix and identifier, you end up with identifiers that look like CURIEs like DOID:11337 or redundant CURIEs that look like DOID:DOID:11337.
3. Identifiers.org doesn’t even handle CURIEs constructed following the rules for embedding the prefix in the local identifier.
4. It creates ambiguities in spreadsheets where columns are supposed to contain local identifiers or CURIEs.

The solution is simply to drop the entire notion of namespaces embedded in local unique identifiers. Since this would require updating a lot of data in a lot of places, the interim solution is to programmatically normalize identifiers and CURIEs in the meantime to remove instances of this redundancy.

## Registry

A registry is a special kind of semantic space that enumerates other semantic spaces and assigns them local identifiers. Due to the connection with prefix maps and CURIEs, the local identifiers in registries are also colloquially called prefixes.

A registry also collects additional metadata about each semantic space, including its name, its canonical prefix, its stylized prefix, additional prefix synonyms, its homepage, an example local identifier, a regular expression pattern for local identifiers, and one or more URI format strings from both first-party and third-party sources. However, there are a wide variety of metadata standards across various biomedical and semantic web registries, and not all fields are included.

Like with semantic spaces, a high-quality registry should have an associated first-party provider that comprises a website for exploring its entries and their associated metadata.

## Metaregistry

A metaregistry is a special kind of registry that assigns local identifiers to a collection of registries; it could even contain an entry about itself. It collects additional metadata about each registry, such as a description of its metadata standards and capabilities. Most importantly, a metaregistry contains mappings between equivalent entries in its constituent registries. Before the publication of this article, to the best of our knowledge, there were no dedicated metaregistries. Some registries such as FAIRSharing and the MIRIAM/Identifiers.org registry contain limited numbers of entries referring to other registries (e.g., BioPortal), but they neither delineate these records as representing registries, provide additional metadata, nor provide mappings.

The only metaregistry in the biomedical domain is the Bioregistry.

## Resolver

A resolver uses a registry to generate a URI for a given CURIE based on the registry’s default provider for the semantic space with the given prefix, then redirects the requester to the constructed URI. Resolvers are different from providers in that they are general for many semantic spaces and do not host content themselves. Two well-known resolvers are Identifiers.org and Name-To-Thing.

Lookup Service A lookup service is like a provider but generalized to provide for many semantic spaces. They typically have a URI format string into which a compact identifier can be placed like OntoBee, but many require more complicated programmatic logic to construct. Some well-known lookup services are the OLS, AberOWL, OntoBee, and BioPortal.