The other day I saw a tweet lamenting the drag that is literature review during preparation for writing your thesis.

I agree. I felt the same pain last fall when I wrote my doctoral thesis. Luckily, I had a strategy that made it a bit easier.

I learned it from one of my professors when I was doing my master’s degree in Life Science Informatics. Each semester, we had a seminar course in which each student was assigned research articles to read and present to the class with a short slide deck. Later, I joined his research group and realized that this course served as a literature review for him just as much as us.

So later when I was a Ph.D. student, I volunteered to run the seminar. I co-opted the concept, and planned the course to cover many of the topics I found interesting for my thesis. I already knew some of the papers very well, and a few were ones I had always been meaning to read. I tried to pick the most recent papers for topics when possible, but also threw in a few classics as well.

On the first day of the seminar, I shared the following course information. I thought it was important to make clear what my expectations were for students in terms of their prior knowledge. Since they all came from the same master’s program, I thought it was enough that they had passed one of the first semester lectures called “Biological Databases” which was about many of the resources and databases used in the systems and networks biology community. I also outlined what was the content for the course, what was expected, etc. then shared this all as a Google Doc so they could read it over and add comments.

I also made a list of possible papers and a tentative schedule that students could look over and decide which papers they found most interesting. The topics were arranged in a logical order to tell the story of my thesis, and for each section there were a few papers that I thought were very important, and a few extras just in case there was a lot of interest. During the first day of the seminar, I also went through the list of all papers and explained the topics to the students. I gave them this list via Google Docs as well, and they were able to claim papers for their presentations. Below, I’ve listed the final list of papers and the order in which they were presented. We were able to come to agreements for all students to present the papers I found most important. Maybe 40% of the class found a paper interesting and picked one the first day and the rest took the next week to decide, ask questions, or propose new papers.

Another consideration I had when picking this paper list was to choose work done by my colleagues that I found interesting and helpful. After, I invited them to come listen to the seminar and mediate discussion after. We were able to invite one of my collaborators Mehdi Ali (he’s a really good guy!) to discuss his work on using deep learning for relation extraction in natural language processing. I think that might have been the most engaging day of the whole series.

I added one aspect to this course compared to the previous seminar that I had attended: each student was not only responsible for presenting the paper that had been assigned from my list, but they were also responsible for finding a relevant pre-print (in the same or similar topic) and submitting a peer review through the pre-print system. When I was a student, I noticed many students did not read the references of the paper they were assigned in our seminars, and also had not considered other similar research to their paper. Asking them to find their own papers was a way to make this a more creative and fun process, and would directly prepare them to answer questions at the end of the presentation like “what will the authors do next?” or “how will this research be used by others?”

One of the funny things that happened during the pre-print presentations is the students found several of mine and presented those. I suppose this was inevitable given the contemporary nature of my work in the context of the topics chosen. I would actually explicitly encourage students to check out my pre-prints the next time I host a seminar, because I know the work very well and could mediate a nice discussion.

I learned a lot through the process of preparing this seminar. Its outline became the outline for my thesis, and a lot of the discussions became points that I addressed explicitly in my writing. I wouldn’t say that I was taking advantage of the students in this process - we all benefited from the experience. I hope you get some ideas about how you might be able to do this yourself, whether you’re a doctoral student, a postdoc, or something else!

Course Information

  • Title: Knowledge Assembly, Data Integration, and Modeling in Systems and Networks Biology
  • Period: Winter Semester 2018/2019
  • Location: Endenicher Allee 19A, Room U.105 on Wednesdays 13.00-14.30


Students should be comfortable with the material presented in the Biological Databases lecture during the first semester of the LSI curriculum.


Students will have the opportunity to practice reading, presenting, and discussing recent biomedical literature on the topics of knowledge assembly, data integration, and modeling in systems and networks biology.


Students will be assigned papers and present on the holistic process of knowledge discovery in systems and networks biology that focus on the topics of knowledge assembly (e.g., natural language processing, modeling formalisms and formats, reasoning techniques), data integration (e.g., practical scenarios focusing on techniques on the data level, knowledge level, and analytical levels), and modeling strategies (e.g., rule-based modeling, agent-based modeling, mathematical modeling, hypothesis generation with knowledge-based approaches).


Students will be assigned an article to read and present during a thirty (30) minute lecture. One goal of this lecture is to show an understanding of not only the material presented in the article, but also the relevant background information - this may entail following the references and reading other articles. Another goal is to not only educate, but entertain the audience. Students will also be expected to find a relevant pre-print article on arXiv, bioRxiv, or other pre-print server and post a peer-review for the author on the corresponding service. Following the presentation of their assigned article, students should include slides (1-3) briefly explaining the relevance of the pre-print that they found.

Method of Performance Review

Students will be assessed on the understanding of their assigned topic, the quality of their presentation, and their participation. Students missing more than 2 seminars will not pass the course without a doctor’s note.


Week 0 - October 10th, 2018 - Syllabus Week

This week there will a short discussion of the syllabus and no presentation. For those in Bonn that aren’t aware of this wonderful tradition, welcome to Syllabus Week.

Week 1 - October 31st, 2018 - Named Entity Recognition

Mubassher Leser, U., & Hakenberg, J. (2005). What makes a gene name? Named entity recognition in the biomedical literature. Briefings in Bioinformatics, 6(4), 357–369.


Bachman, J. A., Gyori, B. M., & Sorger, P. K. (2018). FamPlex: A resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining. BMC Bioinformatics, 19(1), 1–14.


Week 2 - November 7th, 2018 - Identifiers

Laibe, C., & Le Novère, N. (2007). MIRIAM Resources: tools to generate and resolve robust cross-references in Systems’ Biology. BMC Systems Biology, 1, 58.


Juty, N., Le Nover̀e, N., & Laibe, C. (2012). and MIRIAM Registry: Community resources to provide persistent identification. Nucleic Acids Research, 40(D1), 580–586.


Week 3 - November 14th, 2018 - Information Extraction

Novichkova, S., et al. (2003). MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics, 19(13), 1699–1706.


Ali, M., et al. (2017). Automatic Extraction of BEL-Statements based on Neural Networks. Proceedings of BioCreative VI Challenge and Workshop, (October).

Pre-print :

Week 4 - November 21nd, 2018 - Knowledge Representations

Demir, E., et al. (2010). The BioPAX community standard for pathway data sharing. Nature Biotechnology, 28(12), 1308–1308.


Hucka, M., et al. (2003). The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics (Oxford, England), 19(4), 524–31.


Week 5 - November 28th - Knowledge Representations (cont…)

Le Novère, et al. (2009). The Systems Biology Graphical Notation. Nature Biotechnology, 27(8), 735–41.


Carbon, S., et al. (2017). Expansion of the gene ontology knowledgebase and resources: The gene ontology consortium. Nucleic Acids Research, 45(D1), D331–D338.


Week 6 - December 12th, 2018 - Pathway Databases and Semantic Data Integration

Croft, D., et al. (2014). The Reactome pathway knowledgebase. Nucleic Acids Research, 42(D1), D472–D477. AND Fabregat, A., et al. (2018). The Reactome Pathway Knowledgebase. Nucleic Acids Research, 46(D1), D649–D655.


Cerami, E. G., et al. (2011). Pathway Commons, a web resource for biological pathway data. Nucleic Acids Research, 39(SUPPL. 1), 685–690.


Khatri, P., Sirota, M., & Butte, A. J. (2012). Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Computational Biology, 8(2).


Gligorijević, V., & Pržulj, N. (2015). Methods for biological data integration: perspectives and challenges. Journal of The Royal Society Interface, 12(112), 20150571.


Week 8 - January 16th, 2019 - Applications

Saqi, M., et al. (2018). Navigating the disease landscape: knowledge representations for contextualizing molecular signatures. Briefings In Bioinformatics, (May), 1–15.


Himmelstein, D. S., et al. (2017). Systematic integration of biomedical knowledge prioritizes drugs for repurposing. ELife, 6.


Week 9 - January 23rd, 2019 - Applications

Lopez, C. F., et al. (2013). Programming biological models in Python using PySB. Molecular Systems Biology, 9(646), 646.


Gyori, B. M., et al. (2017). From word models to executable models of signaling networks using automated assembly. Molecular Systems Biology, 13(11), 954.