Validating the FAIRness of knowledge graphs and ontologies in RDF using the Bioregistry
Using standard CURIE prefixes and URI prefixes in semantic web artifacts such as
Resource Description Framework (RDF)
promotes interoperability, enables reuse in downstream data integration, and
makes data more FAIR. The Bioregistry defines a set of
standard CURIE prefixes and URI prefixes against which RDF files can be
validated/standardized. This blog post describes a new CLI tool
bioregistry validate ttl
in the Bioregistry Python package that can run
validation on Turtle files (a
common serialization of RDF).
RDF data stored in Turtle files typically begins with a stanza defining a prefix map. For example, one of the Turtle files in the Chemotion Knowledge Graph (Chemotion-KG) begins with the following six prefixes:
@prefix nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/> .
@prefix ns1: <http://purls.helmholtz-metadaten.de/mwo/> .
@prefix ns2: <http://purl.obolibrary.org/obo/chebi/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
The following command can be used to validate it. Using --tablefmt github
results in a nice table that can be formatted into a blog post, otherwise it
outputs text in a more vertical format.
$ bioregistry validate ttl https://github.com/ISE-FIZKarlsruhe/chemotion-kg/raw/4cb5c24af/processing/output_bfo_compliant.ttl
prefix | uri_prefix | issue | solution |
---|---|---|---|
nfdicore | https://nfdi.fiz-karlsruhe.de/ontology/ | non-standard CURIE prefix | Switch to standard prefix: nfdi.core |
ns1 | http://purls.helmholtz-metadaten.de/mwo/ | unknown CURIE prefix | Consider switching to the more specific CURIE/URI prefix pair mwo: http://purls.helmholtz-metadaten.de/mwo/mwo_ |
ns2 | http://purl.obolibrary.org/obo/chebi/ | unknown CURIE prefix |
I was able to directly open an issue on the GitHub repository to give feedback. In general, I think this is a very powerful use of the Bioregistry because it can support groups interested in making knowledge graphs and ontologies towards improving their data and ultimately making it more FAIR.
In case you’re interested in how I implemented this, check this PR. I was able to reuse some ideas from a previous JSON-LD validator, extend them, and abstract the code. Later, I will be able to implement similar validators for XML files, ontologies, and any other resource from which I can extract a prefix map.
I also left a TODO inside the code, since this can be extended with several
other ways of validating URI prefixes. Ultimately, this may get upstreamed into
the curies
package to make it even
more accessible to groups making their own prefix maps or using custom instances
of the Bioregistry.