Inference over Semantic Mappings with SeMRA
Assembling and inferring missing semantic mappings is a timely problem in biomedical data and knowledge integration. I’ve been developing the Semantic Mapping Assembler and Reasoner (SeMRA) as a generic toolkit for this. In this blog post, I highlight its inference capabilities.
SeMRA implements the chaining and inference rules described in the SSSOM specification. The first rule is inversions:
from semra import Mapping, EXACT_MATCH, Reference
from semra.inference import infer_reversible
r1 = Reference(prefix="chebi", identifier="107635", name="2,3-diacetyloxybenzoic")
r2 = Reference(prefix="mesh", identifier="C011748", name="tosiben")
mapping = Mapping(s=r1, p=EXACT_MATCH, o=r2)
mappings = infer_reversible([mapping])
graph LR
A[2,3-diacetyloxybenzoic<br/>chebi:107635] -- skos:exactMatch --> B[tosiben<br/>mesh:C011748]
B -. "skos:exactMatch<br/>(inferred)" .-> A
The second rule is about transitivity. This means some predicates apply over chains. SeMRA further implements configuration for two-length chains and could be extended to arbitrary chains.
from semra import Reference, Mapping, EXACT_MATCH
from semra.inference import infer_chains
r1 = Reference.from_curie("mesh:C406527", name="R 115866")
r2 = Reference.from_curie("chebi:101854", name="talarozole")
r3 = Reference.from_curie("chembl.compound:CHEMBL459505", name="TALAROZOLE")
m1 = Mapping(s=r1, p=EXACT_MATCH, o=r2)
m2 = Mapping(s=r2, p=EXACT_MATCH, o=r3)
mappings = infer_chains([m1, m2])
graph LR
A[R 115866<br/>mesh:C406527] -- skos:exactMatch --> B[talarozole<br/>chebi:101854]
B -- skos:exactMatch --> C[TALAROZOLE<br/>chembl.compound:CHEMBL459505]
A -. "skos:exactMatch<br/>(inferred)" .-> C
The third rule is
generalization,
which means that a more strict predicate can be relaxed to a less specific
predicate, like owl:equivalentTo
to skos:exactMatch
.
from semra import Reference, Mapping, EXACT_MATCH, EQUIVALENT_TO
from semra.inference import infer_generalizations
r1 = Reference.from_curie("chebi:101854", name="talarozole")
r2 = Reference.from_curie("chembl.compound:CHEMBL459505", name="TALAROZOLE")
m1 = Mapping(s=r1, p=EXACT_MATCH, o=r2)
mappings = infer_generalizations([m1])
graph LR
A[talarozole<br/>chebi:101854] -- owl:equivalentTo --> B[TALAROZOLE<br/>chembl.compound:CHEMBL459505]
A -. "skos:exactMatch<br/>(inferred)" .-> B
The third rule can actually be generalized to any kinds of mutation of one
predicate to another, given some domain knowledge. For example, some resources
curate oboInOwl:hasDbXref
predicates when it’s implied that they mean
skos:exactMatch
because the resource is curated in the OBO flat file format.
from semra import Reference, Mapping, DB_XREF
from semra.inference import infer_dbxref_mutations
r1 = Reference.from_curie("doid:0050577", name="cranioectodermal dysplasia")
r2 = Reference.from_curie("mesh:C562966", name="Cranioectodermal Dysplasia")
m1 = Mapping(s=r1, p=DB_XREF, o=r2)
# we're 99% confident doid-mesh dbxrefs actually are exact matches
mappings = infer_dbxref_mutations([m1], {("doid", "mesh"): 0.99})
graph LR
A[cranioectodermal dysplasia<br/>doid:0050577] -- oboInOwl:hasDbXref --> B[Cranioectodermal Dysplasia<br/>mesh:C562966]
A -. "skos:exactMatch<br/>(inferred)" .-> B
There’s a lot more to say about semantic mappings - a good first place to look before getting into the guts of the code is the accompanying manuscript:
Assembly and reasoning over semantic mappings at scale for biomedical data integration
Hoyt, C. T., Karis K., and Gyori, B. M.
bioRxiv, 2025.04.16.649126