My name is Charles Tapley Hoyt (he/his). I’m a scientist
in the Institute of Inorganic Chemistry
at RWTH Aachen University.
I’m building my own research group focused on software development, data standardization/FAIRification/integration, and
applications of ML/AI in the chemical, biological, and health sciences - specifically in drug discovery and precision medicine.
Through my position at RWTH Aachen University, I’m establishing academic collaborations through German, European, and international grants and developing project-based contracts for organizations with unmet business needs addressed by the semantic technologies and capabilities that I’ve developed and write about here. Privately, I can offer consulting services, speaking engagements, and training for organizations interested in these topics.
Here’s some more details about me and my research. You
can download my résumé
(single page), CV, or see
my ORCiD page at
https://orcid.org/0000-0003-4423-4370.
Content on this site is licensed as
CC BY 4.0. See
also my family recipe blog.
Recent Posts
-
International Society of Biocuration Presents: Curate This!
While researchers typically communicate their work through poster presentations, oral presentations, and written communication, programmers often give (live) demonstrations. I’m not aware of any technical nor practical barriers for why curators couldn’t do the same, and always wished that curators did this more often. This post is about how I planned to make this a reality by starting a podcast with the International Society for Biocuration (ISB) entitled ISB Presents: Curate This!.
-
Efficient Bulk Access to Citations in OpenCitations
OpenCitations aggregates and deduplicates bibliographic information from CrossRef, Europe PubMed Central, and other sources to construct a comprehensive, open index of citations between scientific works. This post describes the
opencitations-clientpackage which wraps the OpenCitations API and implements an automated pipeline for locally downloading, caching, and accessing OpenCitations in bulk. -
Challenges with Semantic Mappings
There are many challenges associated with the curation, publication, acquisition, and usage of semantic mappings. This post examines their philosophical, technical, and practical implications, highlights existing solutions, and describes opportunities for next steps for the community of curators, semantic engineers, software developers, and data scientists who make and use semantic mappings.
-
Semantic Mappings Enable Automated Assembly
Data and knowledge originating from heterogeneous sources often use heterogeneous controlled vocabularies and/or ontologies for annotating named entities. Semantic mappings are essential towards resolving these discrepancies and integrating in a coherent way. This post highlights how this looks in two scenarios: when constructing a knowledge graph for graph machine learning and when constructing a comprehensive lexica for natural language processing, text mining, and curation.
-
Mapping from SSSOM to JSKOS
JSKOS (JSON for Knowledge Organization Systems) is a JSON-based data model for representing terminologies, thesauri, classifications, and other semantic artifacts. Like the Simple Standard for Sharing Ontological Mappings (SSSOM), it can also encode semantic mappings. This post is about developing and implementing a crosswalk between them in the sssom-pydantic Python package.
-
Mapping from SSSOM to Wikidata
At the 4th Ontologies4Chem Workshop in Limburg an der Lahn, I proposed an initial crosswalk between the Simple Standard for Sharing Ontological Mappings (SSSOM) and the Wikidata semantic mapping data model. This post describes the motivation for this proposal and the concrete implementation I’ve developed in
sssom-pydantic. -
Validating Prefix Maps in LinkML Schemas
LinkML enables defining data models and data schemas in YAML informed by semantic web best practices. As such, each definition includes a prefix map. Similarly to my previous posts on validating the prefix maps appearing in Turtle files and in unfamiliar SPARQL endpoints, this post showcases describes a new extension to the Bioregistry that validates prefix maps in LinkML definitions.
-
Books I Read in 2025
Here are the books I read in 2025. My goals for the year were to get some more variety, and I think I managed that.
-
Annotating the Literature with Named Entity Recognition
Annotating the literature with mentions of key concepts from a given domain is often the first step towards extracting more substantial structured knowledge. This can be challenging, as it typically encompasses acquiring and processing the relevant literature and ontologies then installing and applying difficult-to-use named entity recognition (NER) workflows. This post highlights software components I’ve implemented to simplify this workflow. I demonstrate it by annotating the biomedical literature available through PubMed with Medical Subject Headings (MeSH) terms, and also comment on how this can be generalized to other natural sciences, engineering, and humanities disciplines.
-
Machine-Actionable Training Materials at BioHackathon Germany 2025
I recently attended the 4th BioHackathon Germany hosted by the German Network for Bioinformatics Infrastructure (de.NBI). I participated in the project On the Path to Machine-actionable Training Materials in order to improve the interoperability between DALIA, TeSS, mTeSS-X, and Schema.org. This post gives a summary of the activities leading up to the hackathon and the results of our happy hacking.