My research is focused on the generation and application of biological knowledge graphs in drug discovery and precision medicine. Recently, I’ve had a specific focus on knowledge graph embedding methodologies and downstream application of link prediction.
While there are several useful public terminologies useful for curation of biomedical relations, there is often the need to develop new controlled vocabularies, thesauri, taxonomies, and ontologies to support new biological phenomena. I lead the team that created the Curation of Neurodegeneration Supporting Ontology (CONSO).
After identifying named entities within scholarly articles, their relations can be extracted and encoded in a knowledge graph. I lead the team that created the knowledge graph Curation of Neurodegeneration in BEL (CONIB) and later the knowledge graph TauBase.
I also lead the same team to re-curate the knowledge graphs curated and published during the AETIONOMY project and developed a novel rational enrichment workflow. Because of the time and cost of curation, prioritization of articles is crucial. I’ve developed semi-automated curation workflows based on a new metric for information density in regions of knowledge graphs.
I also serve on the Biological Expression Language Committee that facilitates the improvement of the language as a modeling paradigm in systems and networks biology.
Data Integration and Harmonization
In order to check the syntax and semantics of these knowledge graphs, I developed PyBEL. To interactively explore these graphs in a web-based environment and identify biological contractions, I developed BEL Commons.
Finally, to integrate all of the rich biological data sources available to the public, I developed Bio2BEL. During the process, I was able to support the ComPath project, which used the Bio2BEL framework to support the curation of equivalencies and hierarchical relations between entries in several major pathway databases (e.g., KEGG, WikiPathways, Reactome) and then later in the PathMe project where they were harmonized in BEL as a common schema.
Knowledge Graph Embeddings
Knowledge graph embedding methods learn latent representations for the nodes and edges in a graph to support clustering, link prediction, entity disambiguation, and other downstream machine learning tasks.
I’ve worked on PyKEEN, a PyTorch reimplementation of several recent knowledge graph embedding models with a focus on reproducibility. I’ve also developed BioKEEN, which connects biological knowledge graphs in BEL (notably from Bio2BEL) directly to the PyKEEN pipeline.
The link prediction task in knowledge graphs is isomorphic to several tasks in drug discovery and precision medicine.
Predicting links between genes/proteins and diseases accomplishes target identification/prioritization. I’ve worked on GuiltyTargets, which embedded proteins from protein-protein interaction networks annotated with disease-specific differential gene expression patterns. These embeddings were used for positive-unlabeled learning using disease-specific gene lists. While this method works well, it was only single-task (only working on one disease at a time).
Predicting links between chemicals and diseases accomplishes drug repositioning (in the case when the chemical is a known drug) or otherwise novel drug discovery. I’ve worked on DrugReLink, which uses Hetionet to make these predictions for a given chemical or disease.
Because many compounds fail in the clinic due to undesirable side effects, predicting them during early-stage drug discovery could drastically improve the efficiency. I’ve worked on SEffNet, which uses a network composed of drug-disease, drug-side effect, drug-target, and drug-drug links to predict compounds’ side effects and give insight into the targets mediating those side effects.
Some of my ongoing work is to apply these methods in precision medicine. I’m doing it by annotating patients as nodes in networks, and creating edges to biological entities based on clinical measurements (e.g., gene expression) then embedding those nodes for downstream machine learning tasks such as subgroup identification and survival analysis.
- Biomappings: Community Curation of Mappings between Biomedical Entities at International Society of Biocuration 2021 Virtual Conference (online) on October 5, 2021
- Current Issues in Theory, Reproducibility, and Utility of Graph Machine Learning in the Life Sciences at (online) on September 23, 2021
- Perspectives on Knowledge Graph Embedding Models in/out of Biomedicine at AstraZeneca (online) on April 6, 2021
- Future Directions for WikiPathway Meta-curation at WikiPathways Developers Conference Call (online) on January 6, 2021
- The Biological Expression Language and PyBEL in 2020 at COVID-19 Disease Map Community (online) on July 10, 2020
- Introduction to the Biological Expression Language and the Rational Enrichment Workflow at CoronaWhy (online) on May 6, 2020
- Generation and Application of Biomedical Knowledge Graphs at Harvard Medical School in Boston, USA on July 19, 2019
- Applications of Knowledge Graphs in Drug Discovery at Computational Drug Discovery Group, University of Leiden in Leiden, Netherlands on November 5, 2019
- The PyBEL Ecosystem in 2018 at 2018 BEL Community Meeting in Boston, USA on May 18, 2018
- From Knowledge Assembly to Hypothesis Generation at Bio-IT World in Boston, on May 17, 2018
My publications are listed in many places, so there’s not much good in writing it all again on this site. Check one of the following: