The company and community that surround the Biological Expression Language (BEL) are enigmatic, to say the least. This post represents the best I could do to tell the history of Selventa and BEL.
If you’ve read my last few posts, you know that I’m making the best of quarantine time by being quite silly with the way that I’m talking about science. I have a habit of injecting opinion, but to tell the story of Selventa and the Biological Expression Language, I will try to refrain whenever possible. In the following, I chronicle the history of Selventa, the people who worked there, and the community that emerged from their work. It is not a complete history.
There are obvious things that I can not know about the inner workings of Selventa, despite the fact that I used to walk by their building on the way to my internship at Pfizer on Cambridge Park Drive between 2013-14. There are both things that I’ve learned through by word of mouth that I think are worth sharing and that I think are best considered gossip. There are things that I know because of my time and work at Fraunhofer that I’m not able to share due to non-disclosure agreements, too, though a secondary purpose of this post is to show off just how many people have been involved either directly or tangentially in this community, so I will try my best to share what I can. There were even a few cases where I found references to things I thought were under NDA on the internet, so I feel perfectly fine with sharing. I’ve put references to everything that can be qualified throughout this post.
There are also things that I’m aware of that I’ve chosen to exclude because of their lack of relevance, quality, impact, or contribution the community. For example, I have chosen to exclude some papers that have claimed to use BEL for modeling purposes, but have not shared their underlying knowledge graphs. I have also chosen to exclude papers that present new algorithms for BEL graphs that do not share code or examples.
Finally, I am not perfect and do not know everything. I’m certain I’ve missed something important, because it turns out that a lot of people have been working on BEL in the last twenty years. I would be happy to accept suggestions to add things. If a part of this post is about you and you think that I have portrayed you incorrectly, please get in touch. Finally, I plan on maintaining this post as time goes on and more cool things are published in this community. And so, we being on a dark and stormy night in 2001….
2001 Selventa is founded by Keith Elliston (originally as GenStruct, Inc.)
September 8th, 2003 Selventa raises $6.5M in a Series A with lead investors Flagship Pioneering and A.M. Pappas & Associates. Later they will raise $500K and $5M in successive Venture Rounds. The link on Flagship Pioneering to this event is dead.
Sometime in 2003 Biological Expression Language is created by Dexter Pratt while working at Selventa (ref)
Sometime in 2006 Selventa breaks even and has positive cash flow (ref). Keith Elliston’s LinkedIn profile claims that it was profitable in 2008 and 2009.
May 17th, 2010 Selventa raises $500K in a Venture Round. This is the second of three rounds of funding, the last of which will occur in late 2011.
November 10th, 2010 Selventa rebrands from GenStruct, Inc. to Selventa (ref)
December 8th, 2010 Selventa replaces Keith Elliston as CEO with David de Graaf (ref).
November 29th, 2011 Selventa raises $5M in its final Venture Round.
April 27th, 2012 The OpenBEL Consortium begins and establishes http://openbel.org as a community resource (ref, written by Jordan Hourani on July 9th, 2012). There has been (and remains) great conflation between the name of the Biological Expression Language, the OpenBEL Framework (see next line), and the OpenBEL Consortium. Skipping ahead a few years: with the later deprecation and abandonment of the OpenBEL Framework, whose organization on GitHub also hosted the OpenBEL Consortium’s website, it became unclear how maintenance should proceed.
May 23rd, 2012 Kevin Davies writes Ring My BEL: Selventa Releases Biological Expression Language to be published on the Bio-IT World Website
May 23rd, 2012 OpenBEL joins Twitter @openbel and posts its first tweet, a re-tweet of a Selventa link to the previously mentioned Kevin Davies article. It’s not clear who was the author or who currently holds the credentials. It’s also not clear at this time whether the Twitter account was for the OpenBEL framework, or the OpenBEL Consortium, which Selventa would organize later that year.
May 31st, 2012 The first and second papers on the Network Perturbation Amplitude were published simultaneously in different journals, respectively authored by Florian Martin (Philip Morris International) and Julia Hoeng (Philip Morris international). Each paper contained several authors from both Philip Morris International (PMI) and Selventa.
June 18th, 2012 Selventa discloses that there is interest/collaboration with several pharmaceutical and software companies as well as academic, governmental and non-profit groups interested and plans to organize an external non-profit organization (The OpenBEL Consortium) to facilitate the community around the Biological Expression Language (ref 1, ref 2). This list includes Pfizer, Merck, Thomson Reuters (the department that was involved was later spun off into Clarivate Analytics to support the Metabase/Metacore), Fraunhofer, Harvard Medical School, IDBS (listed, but I wasn’t able to figure out who they were), Linguamatics, Entagen (since dissolved).
July 25th, 2012 The OpenBEL Google group is created. Through 2020, it remains a semi-active place for discussion in the BEL Community.
Fall 2012 Ted Slater, Selventa V.P. of Knowledge Engineering from 2002-2004 and later returning as CTO between 2012-2013, along with Dr. Diane H. Song, marketing, publish Biological Expression Language (BEL): Ringing In A Common Language For The Life Sciences in the Fall 2012 issue of Drug Discovery World as well as a companion piece Saved by the BEL - ringing in a common language for the life sciences
August 26th, 2013 OpenBEL becomes a Linux Foundation Collaborative Project (ref 1, ref 2) as a first attempt at identifying external funding. I was unable to find evidence of when this ended, but I have heard from members at the time that it was ultimately unsuccessful and dissolved.
October 13th, 2013 Following the establishment of the sbv IMPROVER initiative by Philip Morris International and subsequent publications in 2011, 2012, and 2013, their first Network Verification Challenge was held between October 2013 and March 2014. It was published by Sam Ansari (PMI; with shared first authorship for all members sbv IMPROVER team) and marked the first use of BEL in the sbv IMPROVER (ref).
November 2nd, 2013 William Hayes, CTO of Selventa from 2012-2016,
released the first version of the
bel.rb Ruby package on
rubygems. This likely marked the end of Selventa’s support for the OpenBEL Framework,
as both Java was going out of style and their codebase had not aged well. However, it’s generally hard to tell when
software projects are dead. The maintainers, Anthony Bargnesi and Nick Bargnesi,
continued to make intermittent maintenance commits to the OpenBEL Framework’s codebase through June 24th, 2015.
February 2014 Ted Slater publishes a review of BEL, Recent advances in modeling languages for pathway maps and computable biological networks which continues to serve as the appropriate paper to reference for the Biological Expression Language itself. When you skip ahead it might seem obvious that I’m collating information to put together a new reference paper describing the updates from the following six years.
July 11, 2014 Florian Martin (Philip Morris International) and colleagues published their third paper (I think; they have been quite prolific in the 2010’s) paper describing the Network Perturbation Amplitude analysis, this time with no co-authors from Selventa.
Sometime between 2015-2017 With the withdrawn of support from Christoph Brockel (sometime between 2015 and 2017, when he left Pfizer), Pfizer divests from BEL. Its internal BEL-based analytical platform, the Causal Reasoning Engine and its underlying knowledgebase are publicized, but never released.
January 23rd, 2015 In concert with the sbv IMPROVER’s adoption of BEL from PMI, thefifth iteration of the BioCreative Challenge hosts its first BEL-specific text mining challenge. It was organized by OntoGene (Fabio Rinaldi), the sbv IMPROVER/PMI (Sam Ansari, Julia Hoeng), and Fraunhofer (Juliane Fluck, Martin Hofmann Apitius) following the footsteps of the sbvIMPROVER network verification challenge.
Sometime before April 2017 The second iteration of the sbv IMPROVER’s Network Verification Challenge was hosted with a focus on COPD. It’s not clear when this happened, so I’ll say before April 17th, 2015 because the CausalBioNet paper (see next bullet point) used the results. On May 15, 2015, Stéphanie Boué (PMI, with shared first authorship with the sbv IMPROVER team) published a summary of the challenge in F1000 Research.
April 17th, 2015 Stéphanie Boué (PMI) publishes the Causal Biological Networks Database (CausalBioNet) in Oxford Database as a summary of the results of the curation done in the second iteration of the sbv IMPROVER’s Network Verification Challenge. This is the first evidence I found of the participation of Anselmo Di Fabio’s company, Applied Dynamic Solutions (ADS), LLC, in the BEL Community, though the metadata listed on the paper’s page is wrong so it’s not obvious which co-authors had affiliations to that organization at the time, besides Anselmo. Later, William Hayes will join ADS after the dissolution of Selventa.
June 16th, 2015 Justyna Szostak (PMI) and Sumit Madan (Fraunhofer) publish the BELIEF text mining workflow following the fifth BioCreative challenge in Oxford Database. Here is another case where I omitted several other papers following the BioCreative challenge, as none of the other solutions were accessible. This is very, very sad in my opinon.
November 9th, 2015 Afroza Khanam Irin (Fraunhofer) publishes Computational Modelling Approaches on Epigenetic Factors in Neurodegenerative and Autoimmune Diseases and Their Mechanistic Analysis, which outlined a possible addition to the BEL specification to allow the codification of epigenetic modifications in BEL. Unfortunately, this proposal was not considered until the 2018 OpenBEL Consortium meeting, and it is still under a very slow debate.
Some time between 2015 and 2018 Luc Canard (Sanofi) became involved in Fraunhofer’s BEL activities throughout the AETIONOMY which cumulated in this publication.
Sometime in 2016 Selventa dissolves (ref). I think this where this story gets interesting - because it’s also the part that we will be able to understand the least from an outside perspective. If you serch the internet for Selventa, you will indeed find lots of well-written press releases describing the contracts they had made over the years with several notable biotech and pharmaceutical industries. I’ve heard gossip that the reason it fell apart was because of mismanagement, but I can’t weigh in on that.
Later than sometime in 2016 The Selventa team disperses. Part of the technology team that supported the OpenBEL Framework moved to Applied Dynamic Solutions (ADS), LLC. Some of the computational biologists moved to PatientsLikeMe (PLM) (in waves), and I believe some of the computational team moved directly to Philip Morris International. Many continued working together, and with the industrial support for the BEL infrastructure in which they had invested, PMI patronized ADS to fill the void. Before its dissolution, Dexter Pratt had already moved to UCSD and begun work on the NDEx project. Ex-CEO Keith Ellison and current CEO David de Graaf continued their careers in VC and entrepreneurship. Luckily, we have LinkedIn to figure this kind of stuff out.
During its 15 years of operation, Selventa contracted an enormous amount of curation to generate BEL content. I’m not sure what the actual number but I’ve heard that it has millions of edges in it. After the closure of Selventa, it’s not obvious what happened to the intellectual property of the company. Dexter Pratt asked through the OpenBEL Google Group where it was, and Nimisha Schneider claimed that Alexion might own it now. I’m quite interested to know about the fate of this trove of curated content, as many of the Selventa papers alluded to its existence (but the reviewers didn’t seem to mind that they were publishing academic material while claiming industrial secrecy. I’m sorry to inject opinion here but I would rather we not have industrial publications than ones that can’t be reproduced.)
October 1st, 2016 Sumit Madan publishes the second and final publication on the BELIEF text mining pipeline. The lack of updates to this service and lack of further publications might lead the reader to believe the project is abandoned.
October 9th, 2016 Following a long hiatus in the development of open source software to support BEL, Charles Tapley Hoyt (Fraunhofer; that’s me!), Andrej Konotopez (Fraunhofer), and Christian Ebeling (Fraunhofer) release the first version of the PyBEL python package. It was later published in Oxford Bioinformatics. I may be biased, but I think this marked the beginning of the rejuvination of the BEL community. Many more developments from me and colleagues at Fraunhofer follow for the next 3 years through my master’s and doctoral work.
November 1st, 2016 Daniel Domingo Fernández (Fraunhofer) publishes the NeuroMMSig Web server in Oxford Bioinformatics, containing one of the first publicly available BEL knowledge graphs as well as one of the first publicly usable algorithms for BEL graphs.
Sometime in 2016 Cohen Veteran’s BioSciences contracts Fraunhofer to curate a knowledge graph for PTSD and TBI, supported by PyBEL and BEL Commons (ref). In addition, Exaptive developed additional software for visualization.
January 24th, 2017 Asif Emon (Fraunhofer) publishes Using Drugs as Molecular Probes: A Computational Chemical Biology Approach in Neurodegenerative Diseases which jump started both the chemoinformatics side of BEL and inspired the later Bio2BEL project.
February 22th, 2017 John Bachman (Harvard Medical School; HMS) and Ben Gyori (HMS) begin to integrate PyBEL into the INDRA project, divesting from a previous RDF dump of the Selventa Large Corpus whose provenance was untraceable (not sure about if this is true or not). INDRA was published in Molecular Systems Biology later that year.
February 26th, 2017 The sixth BioCreative Challenge hosts a text mining challenge for BEL. It was lead by Fraunhofer (Juliane Fluck, Sumit Madan, Martin Hofmann-Apitius) and Philip Morris International (Justyna Szostak). Again, almost all of the softwares published for this track did not include a demo.
May 22nd, 2017 The final version of
bel.rb (v1.1.2) is released.
The code remains unfunctional, putting an unofficial end to the
With the abandonment of the OpenBEL Framework and
bel.rb, PyBEL remains the only open-source/user-facing BEL
software (for a short time, see 2018).
June 11th, 2017 Charles Tapley Hoyt (Fraunhofer) deploys BEL Commons as the first interactive exploration tool for BEL following the abandonment of the OpenBEL Framework and Cytoscape tool. It is later published in Oxford Database and open source’d.
August 9th, 2017 Fraunhofer starts the Bio2BEL project open source on GitHub. This is a data and knowledge integration effort similar to Pathway Commons for BioPAX and Bio2RDF for RDF, but with a wider range of knowledge included and much greater focus on reproducibility and automation. It was later pre-printed but in late April 2020, has not yet been accepted for publication.
January 31, 2018 William Hayes (now of ADS/BioDati, Inc.) announces
the launch of the BEL.bio website as a replacement for the OpenBEL website. It also announced
the first release of their
bel python package that would serve as a
backend for their upcoming product.
While the announcement caused some confusion throughout the community as to whether OpenBEL was a site for the BEL community and whether it should be deprecated in favor of a new website advertizing another company’s product, some were happy to see leadership coming from an organization (be it academic or industrial) that would be able to commit to long term maintenance. Author note: as a recently started PhD student, I wasn’t in a position to support industrial usage of PyBEL. If you’ve ever worked with the industry, especially as a software developer, you know how needy they are. There were users from a certain company asking for help on a weekly basis until I offered the ultimatum that they should pay for this kind of consultancy. Ultimately, I appreciated William and Anselmo’s leadership from BioDati and saw the advantage in having several complementary software ecosystems. 2018 and 2019 years would be big ones for me and PyBEL, and our focus would diverge from the nominal curation interface and network visualization in BioDati.
February 23rd, 2018 Michaela Gündel (Fraunhofer) publishes the BEL2ABM workflow in Oxford Database, demonstrating that the use cases of BEL were evolving much further than Selventa and PMI’s published use cases.
May 14th, 2018 The 2018 OpenBEL Community Meeting occurs coincident to Bio-IT world in Boston, MA with stakeholders from PMI, Fraunhofer, BioDati, ADS/BioDati, Harvard Medical School, and several others. Together we nominated William Hayes (BioDati), Natalie Catlett (now at PatientsLikeMe), John Bachmann (Harvard Medical School), and Charles Tapley Hoyt (still me, at the time Fraunhofer) due to our mixed statuses in the industry and academy as well as our mixed roles as tool developers and tool users to serve as the BEL Language Committee going forwards. We agreed on guidelines for BEL Enhancement Proposals and published them at http://bep.bel.bio. Videos from this event are available at https://www.youtube.com/playlist?list=PLwXD2R4UjER0IfAQpqxOBkSe08gTPws41.
June 4th, 2018 Dexter Pratt (on behalf of the Cytoscape Consortium) contracts Fraunhofer to improve interoperability between BEL and NDEx through the PyBEL framework. The results were posted to GitHub in their own repository but the utility of the CX format and NDEx interchange were eventually incorporated into the core of PyBEL.
June 27th, 2018 The third sbv IMPROVER Network Verification Challenge was held at PMI in Neuchatel, Switzerland. Further improvements were made to the CausalBioNet to investigate Xenobiotic metabolism and causal biological networks. The three winners were from Charité, University of Bonn, and the Swiss Institute of Bioinformatics - demonstrating further the reach of the BEL community and PMI’s excellent stewardship and engagement.
November 19th, 2018 Charles Tapley Hoyt (Fraunhofer) publishes the first version of a git-based workflow that uses Continuous Integration for writing BEL code in a team environment on GitHub. It is later used in Oxford Database with the rational enrichment workflow (see below).
December 10th, 2018 Following proposals and reviews submitted after the 2018 OpenBEL Community Meeting, William Hayes publishes the BEL v2.1 standard on behalf of the BEL Language Committee.
December 13th, 2018 Daniel Domingo-Fernández (Fraunhofer) publishes the ComPath pathway equivalence database and the ComPath web application in Nature as a first step towards unifying major public pathway databases in BEL. The source code and underlying data were published on GitHub.
February 15th, 2019 Mehdi Ali (University of Bonn) publishes the BioKEEN machine learning package in Oxford Bioinformatics, introducing the BEL community to an entirely new type of qualitative analysis and hypothesis generation using BEL.
May 15th, 2019 Daniel Domingo-Fernández (Fraunhofer) publishes PathMe, the first software integrating KEGG, Reactome, and WikiPathways (in their variety of formats including XML, BioPAX, and GPML/RDF) as well as the accompanying PathMe web application in BMC Bioinformatics.
July 29th, 2019 At some point in 2019, the website (belframework.org) hosting the BEL resources necessary to all BEL files was allowed to expire (I think the transMart Foundation was paying for it and William was responsible, but I’m not sure). With this abandonment, previously written BEL files could no longer be compiled without a new resources server being deployed and the BEL files updated. Luckily, the website was being built from a repository on the OpenBEL GitHub organization, so only the files needed to be updated. The responsibility of maitenance of the Selventa Large Corpus and Selventa Small Corpus (previous released by Selventa under the CC-BY-3.0 license) was taken by Charles Tapley Hoyt (Fraunhofer) and moved to a new GitHub repository.
August 5th, 2019 Farah Humayun (University of Bonn) and Daniel Domingo-Fernández (Fraunhofer) make the first commit to the Heme Knowledge Graph (HemeKG). It is later published in Frontiers in Bioengineering and Biotechnology.
September 24, 2019 Charles Tapley Hoyt (Fraunhofer) announces the release of the BEL v2.2 specification on behalf of the BEL Language Committee.
December 9th, 2019 After a large curation project around neurodegenerative diseases and tauopathies lead by Stephan Gebel and Charles Tapley Hoyt, Fraunhofer makes its last public commit to the Curation of Neurodegeneration in BEL (CONIB) project before the the departure of Charles Tapley Hoyt following his PhD and interest in public curation in this project dwindled.
April 11th, 2020 Daniel Domingo-Fernández (Fraunhofer) releases the COVID19 disease map along with a pre-print on bioRxiv. Additionally, the paper serves as a reference for Fraunhofer’s new OrientDB instance that holds BEL and their Biomedical Knowledge Miner web application
April 3rd, 2020 Jeremy Zucker (Pacific Northwest National Labs) and students of Robert Ness/Olga Vitek (Northeastern; my alma mater!) begin developing pipelines for generating causal models (SCMs) from BEL graphs for modeling of COVID-19, resulting in further interest in BEL in the CoronaWhy working group.
May 9th, 2020 The OpenBEL community website, https://openbel.org is taken down.
There are a few things that I would like to mention as afterthoughts that I don’t know where to place on the timeline.
One of the most egregious ommisions I have made is the date of the BEL 2.0 release and the events that lead up to it. Even crazier, I don’t know much of the pre-2016 history of how my group at Fraunhofer got involved with Selventa - perhaps it was their long history of text mining (since the dictionary and CRF days) that got these groups together. If someone has that information, I would be really glad to include it here.
During my time at Fraunhofer, there were a lot of people working with BEL. Many of them made tools and algorithms that never got published, so unfortunately, they are not included in this history.
The Chemotoxicogenomics Database had been converted to BEL by Thomas Weigers a long time ago, when the XML BEL format existed (another thing that I think wasn’t worth bringing). I corresponded with him about it when I was at Fraunhofer, but unfortunately don’t have access my the emails anymore to double check exactly what we talked about. He did send me the database as XBEL which he said he made with a script he wrote but didn’t have anymore. Ultimately, I decided to re-write the converter to play nicer in the Bio2BEL ecosystem, which worked for a while and then broke because its downstream dependency for parsing the database wasn’t updated. I never got around to re-writing this again.
I’m not sure what happened at PatientsLikeMe, but from LinkedIn I can tell that there was a mass migration of ex-Selventa to Quartz Bio, who also appear to be hiring BEL people to join their team. After burning out from finishing my PhD in late 2019, I haven’t been proactive about keeping up with many people (quarantine life isn’t making me feel that kind of motivation either, which is somehow categorically different from the entire day I spent researching and preparing this blog post), though I do personally know some of them and could ask…
One of the other peculiarities about the history of BEL is the adjacent history of BioPAX. This happened way before my time, so I wonder why they diverged so much. I think that they’re ultimately trying to accomplish the same thing, which is to be a place for putting structured information (and modeling, to an extent). It’s most definitely the case that BioPAX has achieved greater popularity and penetration, due to (in my opinion) the excellent labs that are backing the standard and the fire hydrant of high impact papers coming out of them. However, I think this success might be holding BioPAX back, because many of these papers are simply pulling content from Pathway Commons to do gene set enrichment analysis. Now, if you’re working adjacent to an oncology unit in a hospital, then this is all you need to do to get results good results, write high impact papers, and ultimately help patients. Don’t get me wrong - I’m much happier to see papers that achieve their scientific goals using simple methods. I just think there’s much more potential for BioPAX. I think we’re seeing that movement happen for BEL already.
Then there’s that idea of converting between BioPAX and BEL. It’s sort of a non-starter, since all BioPAX is encoded differently - just because data is able to be stored in a given format (e.g., ontology deriving from BioPAX, RDF, etc.), it doesn’t necessarily mean that the content can be put together with other content in the same format in a meaningful way.
As I close, I will again acknowledge my biases. I’m quite proud of the work I did during my master’s and doctoral work at Fraunhofer. I’m thankful for all of the people who were interested in my projects, contributed to them, and then joined me as co-authors on my publications. When it came to writing this history, I was in a situation where I had lots of high granular information to share on the things that I worked on and also the the desire to share as much of it as possible. I hope I did a good enough job at laying out the landscape of the other things going on outside of my perspective. If you’ve got something to add, all of my contact information is available on the footer of my blog. Or make a pull request against this page directly. Or tweet at me @cthoyt.