Domingo-Fernandez et al. published ComPath: An ecosystem for exploring, analyzing, and curating mappings across pathway databases. in 2018 describing the overlap between human pathways in KEGG, Reactome, and WikiPathways. A lot of the underlying machinery I developed to support this project has been improved since, and it’s time to spread the search to other organisms besides humans and other databases. This blog post is about some additional relation types needed to capture the relations between pathways appearing in these databases.

Like many of my blog posts, this one was inspired by a tweet. After the following discussion, I thought it would be good to better organize the ideas and elaborate.

This blog post will follow apoptosis, one of the most ubiquitous pathways in biology that covers all manners of programed cell death. This blog post isn’t about the nitty-gritty difference between pathways, biological processes, and mechanisms - so we will consider all variants of apoptosis and apoptotic process effectively the same.

Resource Prefix Identifier
Gene Ontology (GO) go GO:0006915
Medical Subject Headings (MeSH) mesh D017209
Kyoto Encyclopedia of Genes and Genomes (KEGG) kegg.pathway map04210
NCI Thesaurus (NCIT) ncit C17557

KEGG, Reactome, and WikiPathways all provide human-specific variants of these pathways (below) as well as many other species, including both model organisms and not.

Resource Prefix Identifier
KEGG kegg.pathway hsa04210
Reactome reactome R-HSA-109581
WikiPathways wikipathways WP254

## Pathways are Equivalent

Two pathways are equivalent and can be represented with skos:exactMatch if they both have the same species specificity. The following relationships are between the non-species specific pathways for apoptosis:

Subject Predicate Object
GO:0006915 skos:exactMatch mesh:D017209
GO:0006915 skos:exactMatch kegg.pathway:map04210
mesh:D017209 skos:exactMatch kegg.pathway:map04210

The following relationships are between the human-specific pathways for apoptosis in KEGG, Reactome, and WikiPathways:

Subject Predicate Object
kegg.pathway:hsa04210 skos:exactMatch reactome:R-HSA-109581
kegg.pathway:hsa04210 skos:exactMatch wikipathways:WP254
wikipathways:WP254 skos:exactMatch reactome:R-HSA-109581

Similarly, the relationships between cow-specific (Bos Taurus; BTA) pathways for apoptosis in KEGG, Reactome, and WikiPathways:

Subject Predicate Object
kegg.pathway:bta04210 skos:exactMatch reactome:R-BTA-109581
kegg.pathway:bta04210 skos:exactMatch wikipathways:WP1018
wikipathways:WP1018 skos:exactMatch reactome:R-BTA-109581

While equivalences begins to tame the ontology of pathways, it is missing links between the GO, MeSH, and NCIT terms to Reactome and WikiPathways.

## Species-Specific Variant of a Pathway

GO, MeSH, NCIT, and many other nomenclatures do not contain species-specific variants of their pathways. However, KEGG contains both a parent pathway, prefixed with map and species-specific pathway, prefixed with their internal 3 or 4-letter species code.

Subject Predicate Object
kegg.pathway:hsa04210 speciesSpecific kegg.pathway:map04210
kegg.pathway:bta04210 speciesSpecific kegg.pathway:map04210

It should generally hold that when X speciesSpecific Y and Y skos:exactMatch Z are true, X speciesSpecific Z. This allows KEGG to serve as a bridge between the species-specific and non-species-specific pathway worlds. However, Domingo-Fernandez et al. showed that there are huge discrepancies between KEGG, Reactome, and WikiPathways, so there is still need to curate/infer the same kinds relationships in Reactome and WikiPathways.

Unfortunately, Reactome and WikiPathways do not (yet) have parent terms for non-species-specific pathways. Asking about this was the point of the tweet that inspired this blog post. Because Reactome uses a standardized nomenclature where all variants of each pathway across species have the same numerical part to their identifier (e.g., R-HSA-109581 and R-BTA-109581), they could institute a similar parent nomenclature like KEGG’s. WikiPathways identifiers do not have this sort of regularity, but they have the benefit of being highly receptive to external input and improvements.

Side bar: I’ve seen an elegant solution for this in OBO that defines child terms with an intersection of the Relation Ontology relation RO:0002160 (only in taxon) to a given species and the parent term, but this is an unnecessarily complicated alternative for the goal of representing the relation between two entities.

## Pathways are Orthologs

Two genes with similar evolutionary history and function appearing in two organisms are called orthologs. Orthology is incredibly important for studying biology because it allows us to make inferences about how human biology works by studying model organisms like mice and rats. There are several databases collecting orthology relationships, such as HomoloGene.

It follows that orthology could be applied to pathways as well. In fact, Reactome’s web interface already has a box below each pathway linking to the orthologous pathways as seen on https://reactome.org/content/detail/R-HSA-109581:

However, this information is not programatically available (AFAIK), and it is not available for other databases like WikiPathways and KEGG. Therefore, we can introduce a relationship orthology to start curating triples like:

Subject Predicate Object
kegg.pathway:hsa04210 orthology kegg.pathway:bta04210

Orthology relationships effectively convey the same information as speciesSpecific with the advantage that they do not require the addition of a parent term. However, between N orthologous pathways, there will be a complete subggraph of (1/2) * N * (N-1) edges (also called a clique in graph theory). Depending on the downstream use case, these kinds of subgraphs can be problematic.

Because kegg.pathway:hsa04210 skos:exactMatch reactome:R-HSA-109581, we can infer reactome:R-HSA-109581 orthology kegg.pathway:bta04210. However, I think it would be best to only curate orthology relationships within a given database because it will increase the size (N) of the clique.

## Pathway is About a Concept

KEGG, Reactome, and WikiPathways not only include pathways, but also other “maps” about specific topics such as diseases, families of proteins, and other biological entities.

For example, KEGG has an entry kegg.pathway:hsa05010 entitled Alzheimer disease - Homo sapiens (human). When using Gilda to generate lexical matchings, the MeSH entry mesh:D000544 (Alzheimer Disease) appeared highly ranked. However, KEGG’s notion of pathway and MeSH’s notion of a disease are not the same, and these two terms should not be considered equivalent. For this case, not only KEGG but also Reactome and WikiPathways, we can introduce a new relationship pathwayAbout. It turns out that WikiPathways also has an Alzheimer’s disease “pathway” as well.

Subject Predicate Object

Note that KEGG and WikiPathways both have specificity in their pathways for organisms, but diseases in MeSH and other nomenclatures aren’t typically stratified by their target organisms. Therefore, the mouse-specific Alzheimer’s disease pathway in WikiPathways (wikipathways:WP2075) could also have the same relationship.

Another example is opsins - a family of light-sensitive proteins. Reactome has a pathway reactome:R-HSA-419771 (Opsins) that is not the same as the MeSH entry mesh:D055355 (Opsins) describing the protein family.

There is specific interest in connecting disease maps appearing in pathway databases to the diseases themselves. WikiPathways has already begun doing this as can be seen on https://www.wikipathways.org/index.php/Pathway:WP2059.

It might be justified to propose an alternate relationship with more specific semantics. More information on various disease-specific curation projects outside major pathway databases can be found at https://disease-maps.org.

## Disease-specific Variant of a Pathway

This is a bit of an afterthought, but it might be mentioning that there are places, like NeuroMMSig, that curate disease-specific variants of pathways. These would need their own dedicated relationships to connect to the “canonical” pathway and to the disease that they describe.

I have to give a huge shout-out to Daniel Domingo-Fernández, Josep Marín-Llaó, Carlos Bobis-Álvarez, and Yojana Gadiya who have done the curation in the ComPath project as well as Ben Gyori who laid the groundwork for improving the lexical mappings with the Gilda software as well as contributed tons of curations for MeSH-GO mappings.

There are still many disjoint resources that need normalization, including the Pathway Ontology, which looks to have lots of information. I’ll be working on it via this GitHub issue.

There’s also PathBank, (curated by Yojana but not appearing in the original ComPath publication), BioCyc, MetaCyc, and many others. Each must first be included in PyOBO as I described in a previous post before getting into curation, so I can reuse all the code.

I’m not super happy with any of the names I’ve given to relationships in this post, either. I’m open to suggestion for improvement. We alternatively discussed using skos:broader and skos:narrower, as well. Further, I’d love to see these kinds of relationships appear in the Relation Ontology itself, but unfortunately I have not been super successful in petitioning them for improvements in the past, so I may start another open ontology project focused on relationships themselves.

This is all part of a greater effort, Biomappings, which Ben and I have been working on to make it much easier to curate equivalences and related mappings. I’ll have more to say about that in a future post.