I am currently supporting Philip Strömert and Noura Rayya in the efforts to modernize and revitalize the Chemical Methods Ontology (CHMO) to support annotation of instrumentation used to produce experimental data captured in the Chemotion electronic laboratory notebook as part of NFDIChem. This post is about the adoption of Simple Standard for Sharing Ontological Mappings (SSSOM) to support interoperability between CHMO and other resources, and the workflow I developed to compare overlapping manual curations from different researchers.

Philip and Noura have already completed the important initial steps of assuming maintainership from the Royal Society of Chemistry, porting the ontology to use a standardized Ontology Development Kit (ODK) layout, and revising the definitions of many classes based on the IUPAC GoldBook.

Landscape of Resources

There are several other NFDI consortia including NFDI4Cat (catalysis), DAPHNE4NFDI (photon and neutron physics), and FAIRmat (materials science) that have similar goals to annotate instrumentation. While each reuse CHMO to some extent for this purpose, DAPHNE4NFDI additionally develops the Photon and Neutron Experimental Techniques (PANET) Ontology and FAIRmat develops the NeXus format and associated NeXus Ontology as part of the NOMAD materials science data management platform.

Further, there are several other resources with similar goals including the Allotrope Foundation Ontology (AFO), the deprecated Physico-chemical Methods and Properties (FIX) ontology, the deprecated Physico-chemical process (REX) ontology, IUPAC GoldBook, and Wikidata.

Establishing Interoperability

In order to establish interoperability between these many resources, we are using the Simple Standard for Sharing Ontological Mappings (SSSOM) to curate exact matches, narrow matches, and broad matches between CHMO terms and external ones in PANET, NeXuS (sort of), AFO, FIX, REX, IUPAC GoldBook, and Wikidata.

First, Philip had Ambika, a student research assistant (Hiwi, abbreviated in German), work for several months to manually curate mappings from CHMO to REX, FIX, AFO, and Wikidata (see this PR).

In parallel, I took the opportunity to spin up a new instance of a SSSOM Curator repository within the NFDI Section Metadata Working Group for Ontology Harmonization and Mappings GitHub repository, run lexical prediction to generate candidate mappings from CHMO, and efficiently manually curate the results in this PR. and this PR over the course of about an hour.

Need for Comparison

The next challenge was to efficiently triage the similarities and differences between my curations and Ambika’s. Therefore, I implemented a workflow for comparing the manually curated mappings in two SSSOM documents in cthoyt/sssom-pydantic#141. This workflow creates a Markdown file describing similarities and differences.

I chained together the following two CLI commands with sssom_pydantic to get the separate mapping files from Ambika’s branch in the NFDI4Chem fork of CHMO, merge them, then run the comparison against my own curations. Note that these won’t be reproducible after the branch is merged and deleted, and the actual results will change as more curation is done.

$ sssom_pydantic merge \
        --input https://github.com/NFDI4Chem/rsc-cmo/raw/refs/heads/Add-tsv-files/src/mappings/fix-mappings.sssom.tsv \
        --input https://github.com/NFDI4Chem/rsc-cmo/raw/refs/heads/Add-tsv-files/src/mappings/afo-mappings.sssom.tsv \
        --input https://github.com/NFDI4Chem/rsc-cmo/raw/refs/heads/Add-tsv-files/src/mappings/rex-mappings.sssom.tsv \
        --input https://github.com/NFDI4Chem/rsc-cmo/raw/refs/heads/Add-tsv-files/src/mappings/wikidata-mappings.sssom.tsv \
        --standardize \
        --output ambika.sssom.tsv
$ sssom_pydantic compare \
    ambika.sssom.tsv \
    https://github.com/nfdi-de/section-metadata-wg-onto/raw/refs/heads/main/sssom/data/positive.sssom.tsv \
    --standardize \
    --standardize-flip \
    --left-label Ambika \
    --right-label Charlie

Since the comparison workflow outputs Markdown, its results can easily be embedded in GitHub issues or my blog, which is itself written in Markdown.

Results

I am happy with the first version of the comparison workflow. Luckily, there were only a small number of discrepancies which have obvious solutions. There were also a few interesting discrepancies which were novel to either my or Ambika’s curations, which can be reviewed by a third curator (sorry Philip, more work for you).

Next Steps

I think that it can be extended to identify and report on one-to-many, many-to-one, and many-to-many mappings which arise when jointly examining two mapping sets. After Philip and others interact with the results, I’m sure we will be able to extend it with other analyses.

More generally, the implementation of the comparison workflow is part of a larger suite of workflows that I would like to describe in future posts including:

  1. merging manually curated mappings
  2. generating OWL ontology bridges
  3. incorporating SSSOM into ODK builds, which I will support Damien Goutte-Gattat to document in the ODK repository and the OBOOK.
  4. unify this analysis with my other idea for doing automated evaluation of predicted mappings, which I hope can be used to run future mapping challenges

Without further ado, here’s the comparison, copied verbatim from the output of the previous command:

Comparison between Ambika and Charlie

  1. CHMO to FIX
  2. CHMO to REX

CHMO to FIX

Subject Comparison

  • 288 entities appear as subjects only in Ambika
  • 19 entities appear as subjects only in Charlie only
  • 138 entities appear as subjects in both

The following 6 subjects (4.3%) appearing in both have conflicting objects:

subject_id subject_label Ambika both Charlie
CHMO:0000141 diffraction method FIX:0000004 (crystallography) FIX:0000217 (diffraction method)  
CHMO:0000164 electron scattering FIX:0000666 (electron scattering spectroscopy) FIX:0000401 (electron scattering)  
CHMO:0000255 flame atomic emission spectroscopy FIX:0000935 (spark method) FIX:0000928 (flame atomic emission spectroscopy)  
CHMO:0000307 X-ray emission spectroscopy FIX:0000673 (X-ray fluorescence spectroscopy) FIX:0000100 (X-ray emission spectroscopy)  
CHMO:0000366 electron energy loss spectroscopy FIX:0000664 (electron impact spectroscopy) FIX:0000663 (electron energy loss spectroscopy)  
CHMO:0000570 proton transfer reaction ion trap mass spectrometry FIX:0000919 (proton transfer reaction ion trap mass spectrometry)   FIX:0000918 (proton transfer reaction mass spectrometry)

Object Comparison

  • 296 entities appear as objects only in Ambika
  • 19 entities appear as objects only in Charlie
  • 138 entities appear as objects in both

The following 2 objects (1.4%) appearing in both have conflicting subjects:

object_id object_label Ambika both Charlie
FIX:0000629 pulsed field gel electrophoresis CHMO:0002315 (pulsed-field electrophoresis)   CHMO:0002316 (pulsed-field gel electrophoresis)
FIX:0000816 square-wave polarography CHMO:0000040 (square-wave voltammetry) CHMO:0000035 (square-wave polarography)  

Subject-Object Pair Comparison

  • 301 subject-object pairs only appear in Ambika
  • 20 subject-object pairs only appear in Charlie
  • 137 subject-object pairs appear in both

The following 1 subject-object pairs (0.7%) appearing in have conflicting predicates or predicate modifiers:

subject_id subject_label object_id object_label warning Ambika Charlie
CHMO:0000164 electron scattering FIX:0000401 electron scattering different predicate skos:narrowMatch skos:exactMatch

CHMO to REX

Subject Comparison

  • 1 entities appear as subjects only in Ambika
  • 18 entities appear as subjects only in Charlie only
  • 0 entities appear as subjects in both

Object Comparison

  • 1 entities appear as objects only in Ambika
  • 18 entities appear as objects only in Charlie
  • 0 entities appear as objects in both

Subject-Object Pair Comparison

  • 1 subject-object pairs only appear in Ambika
  • 18 subject-object pairs only appear in Charlie
  • 0 subject-object pairs appear in both