My research is focused on the generation and application of biological knowledge graphs in drug discovery and precision medicine. Recently, I’ve had a specific focus on knowledge graph embedding methodologies and downstream application of link prediction.

Research Workflow

Biocuration

While there are several useful public terminologies useful for curation of biomedical relations, there is often the need to develop new controlled vocabularies, thesauri, taxonomies, and ontologies to support new biological phenomena. I lead the team that created the Curation of Neurodegeneration Supporting Ontology (CONSO).

After identifying named entities within scholarly articles, their relations can be extracted and encoded in a knowledge graph. I lead the team that created the knowledge graph Curation of Neurodegeneration in BEL (CONIB) and later the knowledge graph TauBase.

I also lead the same team to re-curate the knowledge graphs curated and published during the AETIONOMY project and developed a novel rational enrichment workflow. Because of the time and cost of curation, prioritization of articles is crucial. I’ve developed semi-automated curation workflows based on a new metric for information density in regions of knowledge graphs.

I also serve on the Biological Expression Language Committee that facilitates the improvement of the language as a modeling paradigm in systems and networks biology.

Data Integration and Harmonization

In order to check the syntax and semantics of these knowledge graphs, I developed PyBEL. To interactively explore these graphs in a web-based environment and identify biological contractions, I developed BEL Commons.

Finally, to integrate all of the rich biological data sources available to the public, I developed Bio2BEL. During the process, I was able to support the ComPath project, which used the Bio2BEL framework to support the curation of equivalencies and hierarchical relations between entries in several major pathway databases (e.g., KEGG, WikiPathways, Reactome) and then later in the PathMe project where they were harmonized in BEL as a common schema.

Knowledge Graph Embeddings

Knowledge graph embedding methods learn latent representations for the nodes and edges in a graph to support clustering, link prediction, entity disambiguation, and other downstream machine learning tasks.

I’ve worked on PyKEEN, a PyTorch reimplementation of several recent knowledge graph embedding models with a focus on reproducibility. I’ve also developed BioKEEN, which connects biological knowledge graphs in BEL (notably from Bio2BEL) directly to the PyKEEN pipeline.

Predictions

The link prediction task in knowledge graphs is isomorphic to several tasks in drug discovery and precision medicine.

Predicting links between genes/proteins and diseases accomplishes target identification/prioritization. I’ve worked on GuiltyTargets, which embedded proteins from protein-protein interaction networks annotated with disease-specific differential gene expression patterns. These embeddings were used for positive-unlabeled learning using disease-specific gene lists. While this method works well, it was only single-task (only working on one disease at a time).

Predicting links between chemicals and diseases accomplishes drug repositioning (in the case when the chemical is a known drug) or otherwise novel drug discovery. I’ve worked on DrugReLink, which uses Hetionet to make these predictions for a given chemical or disease.

Because many compounds fail in the clinic due to undesirable side effects, predicting them during early-stage drug discovery could drastically improve the efficiency. I’ve worked on SEffNet, which uses a network composed of drug-disease, drug-side effect, drug-target, and drug-drug links to predict compounds’ side effects and give insight into the targets mediating those side effects.

Some of my ongoing work is to apply these methods in precision medicine. I’m doing it by annotating patients as nodes in networks, and creating edges to biological entities based on clinical measurements (e.g., gene expression) then embedding those nodes for downstream machine learning tasks such as subgroup identification and survival analysis.

Presentations

  1. Assembly and inference over semantic mappings to support the NFDI Terminology Service (invited) at TS4NFDI Community Workshop
  2. Assembly and Reasoning over Semantic Mappings at Scale at Biocuration 2024
  3. Assembly of Domain Knowledge at Scale in Biomedicine and Beyond (invited) at Harvard Medical School - Laboratory of Systems Pharmacology Meeting
  4. Machine-assisted integration of data and knowledge at scale to support biomedical discovery (invited) at NIH BISTI Seminar
  5. Introduction to WPCI 2023 at Winter 2023 Workshop on Prefixes, CURIEs, and IRIs
  6. Democratizing Biocuration, or, How I Learned to Love the Drive-by Curation (invited) at International Society of Biocuration Annual General Meeting
  7. Standardization of chemical prefixes, CURIEs, URIs, and semantic mappings at Ontologies4Chem Workshop 2023
  8. Improving the reproducibility of cheminformatics workflows with chembl-downloader at RDKit User Group Meeting 2023
  9. Improving ontology interoperability with Biomappings (invited) at OBO Academy - Monarch Training Series
  10. Modern prefix management with the Bioregistry and `curies` (invited) at OBO Academy - Monarch Training Series
  11. Promoting the longevity of curated scientific resources through open code, open data, and public infrastructure at Biocuration 2023
  12. Using dashboards to monitor ontology standardisation and community activity at Ontology Summit 2023
  13. Introduction to WPCI 2022 at 2022 Workshop on Prefixes, CURIEs, and IRIs
  14. The Bioregistry, CURIEs, and OBO Community Health at International Conference on Biomedical Ontology (ICBO)
  15. Axiomatizing Chemical Roles (invited) at Ontologies4Chem Workshop 2022
  16. Closing the Semantic Gap: Identifying Missing Mappings and Merging Equivalent Concepts to Support Knowledge Graph Assembly at Harvard Medical School - Sorger Lab Meeting
  17. Knowledge Graph Embedding with PyKEEN in 2022 (invited) at Knowledge Graph Conference (KGC 2022)
  18. A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs at Graph Learning Benchmarks (GLB 2022)
  19. The Biopragmatics Stack: Biomedical and Chemical Semantics for Humans (invited) at Machine-Actionable Data Interoperability for Chemical Sciences (MADICES)
  20. Introduction to WPCI 2021 at 2021 Workshop on Prefixes, CURIEs, and IRIs
  21. Biomappings: Community Curation of Mappings between Biomedical Entities (poster) at 4th Session of the International Society of Biocuration 2021 Virtual Conference
  22. Current Issues in Theory, Reproducibility, and Utility of Graph Machine Learning in the Life Sciences (invited) at Graph Machine Learning in Industry
  23. The Bioregistry: A Metaregistry for Biomedical Entities at 12th International Conference on Biomedical Ontologies
  24. Perspectives on Knowledge Graph Embedding Models in/out of Biomedicine (invited) at AstraZeneca
  25. Future Directions for WikiPathway Meta-curation at WikiPathways Developers Conference Call
  26. The Biological Expression Language and PyBEL in 2020 at COVID-19 Disease Map Community Meeting
  27. Introduction to the Biological Expression Language and the Rational Enrichment Workflow (invited) at CoronaWhy
  28. Applications of Knowledge Graphs in Drug Discovery (invited) at Computational Drug Discovery Group, University of Leiden
  29. Maintenance and Enrichment of Disease Maps in Biological Expression Language (poster) at 4th Disease Maps Community Meeting
  30. Generation and Application of Biomedical Knowledge Graphs (invited) at Harvard Medical School
  31. Identifying Drug Repositioning Candidates using Representation Learning on Heterogeneous Networks (poster) at The Eighth Joint Sheffield Conference on Chemoinformatics
  32. The PyBEL Ecosystem in 2018 at OpenBEL Community Meeting
  33. From Knowledge Assembly to Hypothesis Generation at Bio-IT World
  34. Knowledge Assembly in Systems and Networks Biology (poster) at Bio-IT World
  35. The Human Brain Pharmacome: An Overview (poster) at 3rd European Conference on Translational Bioinformatics
  36. Gene Set Analysis using Phenotypic Screening Data (poster) at Research, Innovation and Scholarship Expo 2015

Publications

My publications are listed in many places, so there’s not much good in writing it all again on this site. Check one of the following: