An Incomplete History of Selventa and the Biological Expression Language (BEL)
The company and community that surround the Biological Expression Language (BEL) are enigmatic, to say the least. This post represents the best I could do to tell the history of Selventa and BEL.
If you’ve read my last few posts, you know that I’m making the best of quarantine time by being quite silly with the way that I’m talking about science. I have a habit of injecting opinion, but to tell the story of Selventa and the Biological Expression Language, I will try to refrain whenever possible. In the following, I chronicle the history of Selventa, the people who worked there, and the community that emerged from their work. It is not a complete history.
There are obvious things that I can not know about the inner workings of Selventa, despite the fact that I used to walk by their building on the way to my internship at Pfizer on Cambridge Park Drive between 2013-14. There are both things that I’ve learned through by word of mouth that I think are worth sharing and that I think are best considered gossip. There are things that I know because of my time and work at Fraunhofer that I’m not able to share due to non-disclosure agreements, too, though a secondary purpose of this post is to show off just how many people have been involved either directly or tangentially in this community, so I will try my best to share what I can. There were even a few cases where I found references to things I thought were under NDA on the internet, so I feel perfectly fine with sharing. I’ve put references to everything that can be qualified throughout this post.
There are also things that I’m aware of that I’ve chosen to exclude because of their lack of relevance, quality, impact, or contribution the community. For example, I have chosen to exclude some papers that have claimed to use BEL for modeling purposes, but have not shared their underlying knowledge graphs. I have also chosen to exclude papers that present new algorithms for BEL graphs that do not share code or examples.
Finally, I am not perfect and do not know everything. I’m certain I’ve missed something important, because it turns out that a lot of people have been working on BEL in the last twenty years. I would be happy to accept suggestions to add things. If a part of this post is about you and you think that I have portrayed you incorrectly, please get in touch. Finally, I plan on maintaining this post as time goes on and more cool things are published in this community. And so, we being on a dark and stormy night in 2001….
2001
2001 Selventa is founded by Noubar Afeyan as Genstruct, Inc. The concept was to create a computational complement to the company Beyond Genomics, Inc. - which was a systems biology company also founded by AGTC (a predecessor fund to Flagship).
2002
Summer 2002 Keith Elliston is hired by Flagship Ventures (later renamed Flagship Pioneering) to work with the founding team of Genstruct (Navin Chandra, Justin Sun, Ted Slater, Dexter Pratt) to find a technology and business model for the company. Keith was brought in by Jim Serum, who co-founded Viaken Systems with Elliston, and was on the board of Genstruct.
October 2002 Keith submits a funding proposal and business plan for Genstruct, based upon a new approach to using artificial intelligence for biological networks proposed by Navin Chandra and developed by the team. The board approves the plan, approves investment in Genstruct, and hires Keith as the CEO.
2003
January 2003 Genstruct does its first commercial project, a pilot project with Pfizer, where they successfully identified the mechanism of action of an unknown cancer drug. This leads to a more than 10 year collaboration between the companies.
Spring 2003. Biological Expression Language is created by Dexter Pratt, Navin Chandra, Keith Elliston, and Ted Slater. This development is based on work done by Navin at Perot Systems and Nets, work done by Dexter on CycL with Cycorp, and on biological ontologies by Keith and Ted.
August 20th, 2003 Genstruct files a patent that outlines many of the core ideas of modern systems biology years before they became mainstream. While it doesn’t mention BEL explicitly, it’s obvious that many of the ideas in this application became part of the BEL standard when it was later released to the public. The patent was accepted 8 years later in 2011. Thanks to Ted Slater for bringing attention to this!
September 8th, 2003 Genstruct raises $6.5M in a Series A with lead investors Flagship Pioneering and A.M. Pappas & Associates, during which they acquire the Pappas portfolio company Incellico, Inc. (led by John Wilbanks and Toby Segaran).
2004
2004 Toby Segaren, working with Justin Sun and other developers at the company, develop the first practical version of the Genstruct Inference Engine. The inference engine implements a reverse causal reasoning using a graph-based approach, and identifies upstream causes (mechanisms) for downstream observations (state changes). Bill Ladd develops the statistical methods that power the inference engine and that are used to evaluate the results of simulations.
February 24, 2004 Genstruct appoints Doug Lauffenberg, Director of the Biological Engineering Division and Uncas & Helen Whitaker Professor of Biological Engineering, Chemical Engineering and Biology at the Massachusetts Institute of Technology (MIT), to the board of directors.
November 2004 Genstruct and Pfizer extend and expand their partnership, applying the Genstruct platform to various R&D and Toxicology programs throughout Pfizer. (https://www.genomeweb.com/archive/genstruct-pfizer-expand-research-partnership#.X3jczJNKh24)
2006
August 2006 Genstruct and GSK extend their partnership to apply the Genstruct platform to define compound mechanisms of action in Oncology (https://www.fdanews.com/articles/61568-genstruct-extends-collaboration-in-cancer-with-glaxosmithkline)
Sometime in 2006 Gensruct reaches cashflow break even (ref). Keith Elliston’s LinkedIn profile claims that it was cash flow positive in 2008 and 2009.
2008
April 30, 2008 Genstruct and Sirtris win the BioIT World Best Practices Award for their work using the Genstruct Platform to identify the mechanisms of action for the Sirtris Sirt1 activators. Sirtris was acquired by GSK to further develop its Sirt1 activators.
July 2008 Genstruct begins a collaboration with Manuel Pietsch and his group on the use of the Genstruct technology to assess the stength and extent of toxicity using network analysis
2009
2009 Genstruct develops the “Network Perturbation Amplitude” algorithm, based on early work done by Jim Watters on pathway expression activation.
September 2009 Board appoints Chris Varma from Flagship Ventures as Executive Chairman.
2010
Jan 2010 Keith Elliston resigns from the company, Chris Varma named CEO (though this is missing from his LinkedIn profile).
May 17th, 2010 Genstruct raises $500K in a Venture Round. This is the second of three rounds of funding, the last of which will occur in late 2011.
May 2010 Genstruct hires David de Graaf as CSO.
July 2010 Chris Varma leaves Genstruct to join Third Rock Ventures.
November 10th, 2010 Genstruct is rebranded as Selventa (ref)
December 2010 David de Graaf named CEO.
2011
June 23rd, 2011 Selventa makes its first tweet from @selventa.
November 29th, 2011 Selventa raises $5M in its final Venture Round.
2012
April 27th, 2012 The OpenBEL Consortium begins and establishes http://openbel.org as a community resource (ref, written by Jordan Hourani on July 9th, 2012). There has been (and remains) great conflation between the name of the Biological Expression Language, the OpenBEL Framework (see next line), and the OpenBEL Consortium. Skipping ahead a few years: with the later deprecation and abandonment of the OpenBEL Framework, whose organization on GitHub also hosted the OpenBEL Consortium’s website, it became unclear how maintenance should proceed.
May 18th, 2012 Selventa makes their first public commit to their open source Java ecosystem, the OpenBEL Framework.
May 23rd, 2012 Kevin Davies writes Ring My BEL: Selventa Releases Biological Expression Language to be published on the Bio-IT World Website
May 23rd, 2012 OpenBEL joins Twitter @openbel and posts its first tweet, a re-tweet of a Selventa link to the previously mentioned Kevin Davies article. It’s not clear who was the author or who currently holds the credentials. It’s also not clear at this time whether the Twitter account was for the OpenBEL framework, or the OpenBEL Consortium, which Selventa would organize later that year.
May 31st, 2012 The first and second papers on the Network Perturbation Amplitude were published simultaneously in different journals, respectively authored by Florian Martin (Philip Morris International) and Julia Hoeng (Philip Morris international). Each paper contained several authors from both Philip Morris International (PMI) and Selventa.
June 18th, 2012 Selventa discloses that there is interest/collaboration with several pharmaceutical and software companies as well as academic, governmental and non-profit groups interested and plans to organize an external non-profit organization (The OpenBEL Consortium) to facilitate the community around the Biological Expression Language (ref 1, ref 2). This list includes Pfizer, Merck, Thomson Reuters (the department that was involved was later spun off into Clarivate Analytics to support the Metabase/Metacore), Fraunhofer, Harvard Medical School, IDBS (listed, but I wasn’t able to figure out who they were), Linguamatics, Entagen (since dissolved).
July 25th, 2012 The OpenBEL Google group is created. Through 2020, it remains a semi-active place for discussion in the BEL Community.
August 9th, 2012 David Fryburg, Selventa’s Chief Medical Officer from 2011-2015, authored a company profile in Future Medicine (paywall).
Fall 2012 Ted Slater, Selventa V.P. of Knowledge Engineering from 2002-2004 and later returning as CTO between 2012-2013, along with Dr. Diane H. Song, marketing, publish Biological Expression Language (BEL): Ringing In A Common Language For The Life Sciences in the Fall 2012 issue of Drug Discovery World as well as a companion piece Saved by the BEL - ringing in a common language for the life sciences
2013
August 26th, 2013 OpenBEL becomes a Linux Foundation Collaborative Project (ref 1, ref 2) as a first attempt at identifying external funding. I was unable to find evidence of when this ended, but I have heard from members at the time that it was ultimately unsuccessful and dissolved.
October 13th, 2013 Following the establishment of the sbv IMPROVER initiative by Philip Morris International and subsequent publications in 2011, 2012, and 2013, their first Network Verification Challenge was held between October 2013 and March 2014. It was published by Sam Ansari (PMI; with shared first authorship for all members sbv IMPROVER team) and marked the first use of BEL in the sbv IMPROVER (ref).
November 2nd, 2013 William Hayes, CTO of Selventa from 2012-2016,
released the first version of the bel.rb
Ruby package on
rubygems. This likely marked the end of Selventa’s support for the OpenBEL Framework,
as both Java was going out of style and their codebase had not aged well. However, it’s generally hard to tell when
software projects are dead. The maintainers, Anthony Bargnesi and Nick Bargnesi,
continued to make intermittent maintenance commits to the OpenBEL Framework’s codebase through June 24th, 2015.
November 23rd, 2013 Natalie Catlett (Selventa) and colleagues publish the Reverse Causal Reasoning algorithm in BMC Bioinformatics, this time with no co-authors from PMI.
2014
January 23rd, 2014 With the death of the OpenBEL Framework in sight, the @openbel Twitter account
began to publicize the bel.rb
Ruby package.
February 2014 Ted Slater publishes a review of BEL, Recent advances in modeling languages for pathway maps and computable biological networks which continues to serve as the appropriate paper to reference for the Biological Expression Language itself. When you skip ahead it might seem obvious that I’m collating information to put together a new reference paper describing the updates from the following six years.
July 11, 2014 Florian Martin (Philip Morris International) and colleagues published their third paper (I think; they have been quite prolific in the 2010’s) paper describing the Network Perturbation Amplitude analysis, this time with no co-authors from Selventa.
2015
Sometime between 2015-2017 With the withdrawn of support from Christoph Brockel (sometime between 2015 and 2017, when he left Pfizer), Pfizer divests from BEL. Its internal BEL-based analytical platform, the Causal Reasoning Engine and its underlying knowledgebase are publicized, but never released.
January 23rd, 2015 In concert with the sbv IMPROVER’s adoption of BEL from PMI, thefifth iteration of the BioCreative Challenge hosts its first BEL-specific text mining challenge. It was organized by OntoGene (Fabio Rinaldi), the sbv IMPROVER/PMI (Sam Ansari, Julia Hoeng), and Fraunhofer (Juliane Fluck, Martin Hofmann Apitius) following the footsteps of the sbvIMPROVER network verification challenge.
March 2015 David de Graaf steps out of CEO role at Genstruct, joins Flagship Venturelabs.
Sometime before April 2017 The second iteration of the sbv IMPROVER’s Network Verification Challenge was hosted with a focus on COPD. It’s not clear when this happened, so I’ll say before April 17th, 2015 because the CausalBioNet paper (see next bullet point) used the results. On May 15, 2015, Stéphanie Boué (PMI, with shared first authorship with the sbv IMPROVER team) published a summary of the challenge in F1000 Research.
April 17th, 2015 Stéphanie Boué (PMI) publishes the Causal Biological Networks Database (CausalBioNet) in Oxford Database as a summary of the results of the curation done in the second iteration of the sbv IMPROVER’s Network Verification Challenge. This is the first evidence I found of the participation of Anselmo Di Fabio’s company, Applied Dynamic Solutions (ADS), LLC, in the BEL Community, though the metadata listed on the paper’s page is wrong so it’s not obvious which co-authors had affiliations to that organization at the time, besides Anselmo. Later, William Hayes will join ADS after the dissolution of Selventa.
June 16th, 2015 Justyna Szostak (PMI) and Sumit Madan (Fraunhofer) publish the BELIEF text mining workflow following the fifth BioCreative challenge in Oxford Database. Here is another case where I omitted several other papers following the BioCreative challenge, as none of the other solutions were accessible. This is very, very sad in my opinon.
November 9th, 2015 Afroza Khanam Irin (Fraunhofer) publishes Computational Modelling Approaches on Epigenetic Factors in Neurodegenerative and Autoimmune Diseases and Their Mechanistic Analysis, which outlined a possible addition to the BEL specification to allow the codification of epigenetic modifications in BEL. Unfortunately, this proposal was not considered until the 2018 OpenBEL Consortium meeting, and it is still under a very slow debate.
Some time between 2015 and 2018 Luc Canard (Sanofi) became involved in Fraunhofer’s BEL activities throughout the AETIONOMY which cumulated in this publication.
2016
Sometime in 2016 Selventa dissolves (ref). I think this where this story gets interesting - because it’s also the part that we will be able to understand the least from an outside perspective. If you serch the internet for Selventa, you will indeed find lots of well-written press releases describing the contracts they had made over the years with several notable biotech and pharmaceutical industries. I’ve heard gossip that the reason it fell apart was because of mismanagement, but I can’t weigh in on that.
Later than sometime in 2016 The Selventa team disperses. Part of the technology team that supported the OpenBEL Framework moved to Applied Dynamic Solutions (ADS), LLC. Some of the computational biologists moved to PatientsLikeMe (PLM) (in waves), and I believe some of the computational team moved directly to Philip Morris International. Many continued working together, and with the industrial support for the BEL infrastructure in which they had invested, PMI patronized ADS to fill the void. Before its dissolution, Dexter Pratt had already moved to UCSD and begun work on the NDEx project. Ex-CEO Keith Ellison and current CEO David de Graaf continued their careers in VC and entrepreneurship. Luckily, we have LinkedIn to figure this kind of stuff out.
During its 15 years of operation, Selventa performed an enormous amount of curation to generate BEL content. I’m not sure what the actual number but I’ve heard that it has millions of edges in it. After the closure of Selventa, the intellectual property of the company was sold to Alexion. Dexter Pratt asked through the OpenBEL Google Group where it was, and Nimisha Schneider claimed that Alexion might own it now. I’m quite interested to know about the fate of this trove of curated content, as many of the Selventa papers alluded to its existence (but the reviewers didn’t seem to mind that they were publishing academic material while claiming industrial secrecy. I’m sorry to inject opinion here but I would rather we not have industrial publications than ones that can’t be reproduced.)
October 1st, 2016 Sumit Madan publishes the second and final publication on the BELIEF text mining pipeline. The lack of updates to this service and lack of further publications might lead the reader to believe the project is abandoned.
October 9th, 2016 Following a long hiatus in the development of open source software to support BEL, Charles Tapley Hoyt (Fraunhofer; that’s me!), Andrej Konotopez (Fraunhofer), and Christian Ebeling (Fraunhofer) release the first version of the PyBEL python package. It was later published in Oxford Bioinformatics. I may be biased, but I think this marked the beginning of the rejuvination of the BEL community. Many more developments from me and colleagues at Fraunhofer follow for the next 3 years through my master’s and doctoral work.
October 25, 2016 Christian Ebeling (Fraunhofer) presents PyBEL at the tranSMART Foundation 2016 Annual Meeting.
November 1st, 2016 Daniel Domingo Fernández (Fraunhofer) publishes the NeuroMMSig Web server in Oxford Bioinformatics, containing one of the first publicly available BEL knowledge graphs as well as one of the first publicly usable algorithms for BEL graphs.
Sometime in 2016 Cohen Veteran’s BioSciences contracts Fraunhofer to curate a knowledge graph for PTSD and TBI, supported by PyBEL and BEL Commons (ref). In addition, Exaptive developed additional software for visualization.
Sometime in 2016-2017 Boehringer Ingelheim contracts Fraunhofer to curate a knowledge graph for psychiatric conditions in the BEL4IMOCEDE project (ref 1, ref 2), supported by PyBEL and BEL Commons.
2017
January 24th, 2017 Asif Emon (Fraunhofer) publishes Using Drugs as Molecular Probes: A Computational Chemical Biology Approach in Neurodegenerative Diseases which jump started both the chemoinformatics side of BEL and inspired the later Bio2BEL project.
February 22th, 2017 John Bachman (Harvard Medical School; HMS) and Ben Gyori (HMS) begin to integrate PyBEL into the INDRA project, divesting from a previous RDF dump of the Selventa Large Corpus whose provenance was untraceable (not sure about if this is true or not). INDRA was published in Molecular Systems Biology later that year.
February 26th, 2017 The sixth BioCreative Challenge hosts a text mining challenge for BEL. It was lead by Fraunhofer (Juliane Fluck, Sumit Madan, Martin Hofmann-Apitius) and Philip Morris International (Justyna Szostak). Again, almost all of the softwares published for this track did not include a demo.
May 22nd, 2017 The final version of bel.rb
(v1.1.2) is released.
The code remains unfunctional, putting an unofficial end to the bel.rb
project.
With the abandonment of the OpenBEL Framework and bel.rb
, PyBEL remains the only open-source/user-facing BEL
software (for a short time, see 2018).
June 11th, 2017 Charles Tapley Hoyt (Fraunhofer) deploys BEL Commons as the first interactive exploration tool for BEL following the abandonment of the OpenBEL Framework and Cytoscape tool. It is later published in Oxford Database and open source’d.
August 9th, 2017 Fraunhofer starts the Bio2BEL project open source on GitHub. This is a data and knowledge integration effort similar to Pathway Commons for BioPAX and Bio2RDF for RDF, but with a wider range of knowledge included and much greater focus on reproducibility and automation. It was later pre-printed but in late April 2020, has not yet been accepted for publication.
2018
January 30, 2018 BioDati, Inc. officially forms as a spin-off of ADS and announces that it is developing a product called BioDati Studio for BEL curation and visualization.
January 31, 2018 William Hayes (now of ADS/BioDati, Inc.) announces
the launch of the BEL.bio website as a replacement for the OpenBEL website. It also announced
the first release of their bel
python package that would serve as a
backend for their upcoming product.
While the announcement caused some confusion throughout the community as to whether OpenBEL was a site for the BEL community and whether it should be deprecated in favor of a new website advertizing another company’s product, some were happy to see leadership coming from an organization (be it academic or industrial) that would be able to commit to long term maintenance. Author note: as a recently started PhD student, I wasn’t in a position to support industrial usage of PyBEL. If you’ve ever worked with the industry, especially as a software developer, you know how needy they are. There were users from a certain company asking for help on a weekly basis until I offered the ultimatum that they should pay for this kind of consultancy. Ultimately, I appreciated William and Anselmo’s leadership from BioDati and saw the advantage in having several complementary software ecosystems. 2018 and 2019 years would be big ones for me and PyBEL, and our focus would diverge from the nominal curation interface and network visualization in BioDati.
February 23rd, 2018 Michaela Gündel (Fraunhofer) publishes the BEL2ABM workflow in Oxford Database, demonstrating that the use cases of BEL were evolving much further than Selventa and PMI’s published use cases.
May 14th, 2018 The 2018 OpenBEL Community Meeting occurs coincident to Bio-IT world in Boston, MA with stakeholders from PMI, Fraunhofer, BioDati, ADS/BioDati, Harvard Medical School, and several others. Together we nominated William Hayes (BioDati), Natalie Catlett (now at PatientsLikeMe), John Bachmann (Harvard Medical School), and Charles Tapley Hoyt (still me, at the time Fraunhofer) due to our mixed statuses in the industry and academy as well as our mixed roles as tool developers and tool users to serve as the BEL Language Committee going forwards. We agreed on guidelines for BEL Enhancement Proposals and published them at http://bep.bel.bio. Videos from this event are available at https://www.youtube.com/playlist?list=PLwXD2R4UjER0IfAQpqxOBkSe08gTPws41.
Also May 14th, 2018 Christian Ebeling (Fraunhofer) publishes an Atom) plugin for BEL syntax highlighting at https://atom.io/packages/language-bel.
June 4th, 2018 Dexter Pratt (on behalf of the Cytoscape Consortium) contracts Fraunhofer to improve interoperability between BEL and NDEx through the PyBEL framework. The results were posted to GitHub in their own repository but the utility of the CX format and NDEx interchange were eventually incorporated into the core of PyBEL.
June 27th, 2018 The third sbv IMPROVER Network Verification Challenge was held at PMI in Neuchatel, Switzerland. Further improvements were made to the CausalBioNet to investigate Xenobiotic metabolism and causal biological networks. The three winners were from Charité, University of Bonn, and the Swiss Institute of Bioinformatics - demonstrating further the reach of the BEL community and PMI’s excellent stewardship and engagement.
November 19th, 2018 Charles Tapley Hoyt (Fraunhofer) publishes the first version of a git-based workflow that uses Continuous Integration for writing BEL code in a team environment on GitHub. It is later used in Oxford Database with the rational enrichment workflow (see below).
December 10th, 2018 Following proposals and reviews submitted after the 2018 OpenBEL Community Meeting, William Hayes publishes the BEL v2.1 standard on behalf of the BEL Language Committee.
December 11th, 2018 Fraunhofer and Harvard Medical School jointly publish a workflow using INDRA and PyBEL for rationally enriching BEL graphs in Oxford Database available on GitHub (ref).
December 13th, 2018 Daniel Domingo-Fernández (Fraunhofer) publishes the ComPath pathway equivalence database and the ComPath web application in Nature as a first step towards unifying major public pathway databases in BEL. The source code and underlying data were published on GitHub.
2019
February 15th, 2019 Mehdi Ali (University of Bonn) publishes the BioKEEN machine learning package in Oxford Bioinformatics, introducing the BEL community to an entirely new type of qualitative analysis and hypothesis generation using BEL.
February 27th, 2019 Dénes Türei (EMBL, University of Heidelberg) makes the first commit towards integrating BEL with OmniPath.
September 3rd, 2019 After nearly a decade of non-reproducible papers, PMI publishes an open-source (albeit, R) implementation of their Network Perturbation Amplitude analysis in BMC Bioinformatics.
May 15th, 2019 Daniel Domingo-Fernández (Fraunhofer) publishes PathMe, the first software integrating KEGG, Reactome, and WikiPathways (in their variety of formats including XML, BioPAX, and GPML/RDF) as well as the accompanying PathMe web application in BMC Bioinformatics.
July 29th, 2019 At some point in 2019, the website (belframework.org) hosting the BEL resources necessary to all BEL files was allowed to expire (I think the transMart Foundation was paying for it and William was responsible, but I’m not sure). With this abandonment, previously written BEL files could no longer be compiled without a new resources server being deployed and the BEL files updated. Luckily, the website was being built from a repository on the OpenBEL GitHub organization, so only the files needed to be updated. The responsibility of maitenance of the Selventa Large Corpus and Selventa Small Corpus (previous released by Selventa under the CC-BY-3.0 license) was taken by Charles Tapley Hoyt (Fraunhofer) and moved to a new GitHub repository.
August 5th, 2019 Farah Humayun (University of Bonn) and Daniel Domingo-Fernández (Fraunhofer) make the first commit to the Heme Knowledge Graph (HemeKG). It is later published in Frontiers in Bioengineering and Biotechnology.
September 24, 2019 Charles Tapley Hoyt (Fraunhofer) announces the release of the BEL v2.2 specification on behalf of the BEL Language Committee.
December 9th, 2019 After a large curation project around neurodegenerative diseases and tauopathies lead by Stephan Gebel and Charles Tapley Hoyt, Fraunhofer makes its last public commit to the Curation of Neurodegeneration in BEL (CONIB) project before the the departure of Charles Tapley Hoyt following his PhD and interest in public curation in this project dwindled.
2020
March 4th, 2020 The Hetionet project adopts BEL as a distribution format for its integrative network suited for drug repositioning and target prioritization.
March 4th, 2020 The OpenBioLink project adopts BEL as a distribution format for benchmarking link prediction tasks on biological networks.
April 11th, 2020 Daniel Domingo-Fernández (Fraunhofer) releases the COVID19 disease map along with a pre-print on bioRxiv. Additionally, the paper serves as a reference for Fraunhofer’s new OrientDB instance that holds BEL and their Biomedical Knowledge Miner web application
April 3rd, 2020 Jeremy Zucker (Pacific Northwest National Labs) and students of Robert Ness/Olga Vitek (Northeastern; my alma mater!) begin developing pipelines for generating causal models (SCMs) from BEL graphs for modeling of COVID-19, resulting in further interest in BEL in the CoronaWhy working group.
May 9th, 2020 The OpenBEL community website, https://openbel.org is taken down.
There are a few things that I would like to mention as afterthoughts that I don’t know where to place on the timeline.
One of the most egregious ommisions I have made is the date of the BEL 2.0 release and the events that lead up to it. Even crazier, I don’t know much of the pre-2016 history of how my group at Fraunhofer got involved with Selventa - perhaps it was their long history of text mining (since the dictionary and CRF days) that got these groups together. If someone has that information, I would be really glad to include it here.
During my time at Fraunhofer, there were a lot of people working with BEL. Many of them made tools and algorithms that never got published, so unfortunately, they are not included in this history.
The Chemotoxicogenomics Database had been converted to BEL by Thomas Weigers a long time ago, when the XML BEL format existed (another thing that I think wasn’t worth bringing). I corresponded with him about it when I was at Fraunhofer, but unfortunately don’t have access my the emails anymore to double check exactly what we talked about. He did send me the database as XBEL which he said he made with a script he wrote but didn’t have anymore. Ultimately, I decided to re-write the converter to play nicer in the Bio2BEL ecosystem, which worked for a while and then broke because its downstream dependency for parsing the database wasn’t updated. I never got around to re-writing this again.
I’m not sure what happened at PatientsLikeMe, but from LinkedIn I can tell that there was a mass migration of ex-Selventa to Quartz Bio, who also appear to be hiring BEL people to join their team. After burning out from finishing my PhD in late 2019, I haven’t been proactive about keeping up with many people (quarantine life isn’t making me feel that kind of motivation either, which is somehow categorically different from the entire day I spent researching and preparing this blog post), though I do personally know some of them and could ask…
One of the other peculiarities about the history of BEL is the adjacent history of BioPAX. This happened way before my time, so I wonder why they diverged so much. I think that they’re ultimately trying to accomplish the same thing, which is to be a place for putting structured information (and modeling, to an extent). It’s most definitely the case that BioPAX has achieved greater popularity and penetration, due to (in my opinion) the excellent labs that are backing the standard and the fire hydrant of high impact papers coming out of them. However, I think this success might be holding BioPAX back, because many of these papers are simply pulling content from Pathway Commons to do gene set enrichment analysis. Now, if you’re working adjacent to an oncology unit in a hospital, then this is all you need to do to get results good results, write high impact papers, and ultimately help patients. Don’t get me wrong - I’m much happier to see papers that achieve their scientific goals using simple methods. I just think there’s much more potential for BioPAX. I think we’re seeing that movement happen for BEL already.
Then there’s that idea of converting between BioPAX and BEL. It’s sort of a non-starter, since all BioPAX is encoded differently - just because data is able to be stored in a given format (e.g., ontology deriving from BioPAX, RDF, etc.), it doesn’t necessarily mean that the content can be put together with other content in the same format in a meaningful way.
As I close, I will again acknowledge my biases. I’m quite proud of the work I did during my master’s and doctoral work at Fraunhofer. I’m thankful for all the people who were interested in my projects, contributed to them, and then joined me as co-authors on my publications. When it came to writing this history, I was in a situation where I had lots of high granular information to share on the things that I worked on and also the the desire to share as much of it as possible. I hope I did a good enough job at laying out the landscape of the other things going on outside my perspective.
If you’ve got something to add, all of my contact information is available on the footer of my blog. Or make a pull request against this page directly. Special thanks to Keith Ellison for suggesting these changes.