Robert van den Berg is currently the Senior Director, Head Data Sciences & Computational Vaccinology for Vaccines R&D at GlaxoSmithKline, a position he reached after over a decade in the company – in addition to his prior work in academia. We caught up with him for a discussion on the current trends in multi-omics approaches and novel technologies.

PF: It’s a privilege to be able to talk to you about the leading topics in Bioinformatics today, Dr. van den Berg. You have a rather long career in biological data analytics – are there any highlights you would like to share with us?

RB: It was really exciting to start my career out in the omics fields – it felt really pioneering, working in the interface between biology and data science, during my PhD. Perhaps not one of my highlights, but in general it is astonishing to see data science join the mainstream of biomedical research. Corollary to that, it is great to see talented individuals being introduced to our world – the life sciences – from other sectors such as tech, which will enable them to form a real impact on public health and improve the livelihoods of people around the world. We have seen technology revolutionize biomedical science – from microarrays and next generation sequencing to single-cell analysis. We have also seen imaging techniques make groundbreaking progress at breakneck speeds. This progress is supported by advancements not only in the life sciences, but throughout the technological industry and the other sectors it strongly interfaces with. The computer power available for big data analysis is unprecedented – and I suspect it will continue to be as we see subsequent improvements, as we see advancements in AI and Deep Learning while these technologies make their way to biomedical sciences. 

PF: Your career fortuitously coincided with the rapid expansion of computational approaches to biology – what was it like seeing such an accelerated pace of data generation from across the life sciences, and how did you feel the impact of the big data revolution? 

RB: It has definitely been really thrilling to see – the possibility of generating this much data has opened many avenues for investigating complex questions and generating novel insights. It has also given rise to other questions – how to improve our own experiment design to generate the right data to answer these complex questions. We also have to think about other concerns: assessing the reusability of such large datasets in other applications, considering the future-proofing of such data, and what kind of metadata we need to capture. I also think the community made the collectively right decision for encouraging data sharing across scientific publications, and I am really proud of the large repositories we have that store such data for sharing information. Although these datasets aren’t always reusable by anyone – sometimes data is partly redacted, we have made great strides; there is just room for improvement.

PF: Unlike other natural sciences, such as physics – where the most data-rich fields tend to be concentrated around a few large colliders – biology and the life sciences operate in a much more distributed and “democratized” structure. This means that the industry – and academia – have very different, and perhaps higher and more redundant, infrastructural needs. How do you see this being addressed as more organizations seek to harness richer data analytics and approaches? Is collaboration the answer?

RB: That is actually a great question – I think the architecture of data in the life sciences means all the players and actors have different motivations for conducting their research and using the tools they choose to use. Compared to physical sciences, the cost of research is not so high that it becomes impossible to perform – allowing people to generate data independently. There are common drivers to harmonizing these endeavors of course – such as the standards set by journals for data sharing and coding being made available. Obviously standardization and harmonization still have a long way to go – especially compared to physics, as not all datasets have comparable metadata. It could be that this data architecture across the field results in different levels of cooperation – although the COVID-19 pandemic has certainly illustrated the need for greater collaboration, leading to an enormous amount of research generation in a very short period of time. The community was feverishly at work trying to find a way out of the pandemic and in the process, people pre-published much of their work on pre-print servers, accelerating the pace with which other researchers could use them. We have also seen a greater push for transparency and information sharing over the pandemic, which are factors that definitely contribute to a frame shift towards greater standard definitions and cooperation in the life sciences. 

PF: The omics fields present unique challenges with regards to the amount of data they can generate – we have gone from the Human Genome Project to now generating nearly a zettabase of sequence data per year, and this is without considering proteomics, metabolomics, transcriptomics and other layers. Do you feel we need to take a step back and reconsider the amount of data we can use?

RB: A definitely interesting question – from a scientific perspective, I think designing experiments based on the latest insights which generate data for testing our hypotheses, or to demonstrate advancements in the field is key. It can be difficult to step back and pause, however as a company we are not only discussing the value of generating new data, but we are also looking at maximizing the value of existing data – good quality data with the right metadata has extremely high reusability. There are still other things to consider – and a lot to gain from new data, and the difficulty of convincing people to stop generating the amount of data they think they need cannot be underestimated. 

PF: Multi-omics approaches remain one of the most valuable tools to obtain holistic pictures of pathways involved in bodily functions and disease mechanisms. However, integrative approaches remain limited and require significant work to harmonize heterogeneous datasets. How do you see this aspect of the field improving in the near future? 

RB: There are so many large challenges in this domain – one cursory look at the papers being published can identify progress aimed at key deficiencies. These include the availability of high resolution reference matches; consortia like the Human Cell Atlas project have set out to provide these. There is definitely value in harmonizing and “bringing data together”, so to speak, to the benefit of everyone. There is also a constant flow of novel computational methods aimed at improving our approach towards multiomic integration, and perform better comparisons. We can also see that new concepts, such as AI and Machine Learning, have also begun entering the field. The heterogeneity of datasets and experiment variability is indeed a large part of the problem that needs to be addressed, and for that I think improving our experimental design – particularly with a healthy dose of forward thinking with the data analyses and interpretation in mind – can simplify integration the most. It is similar to the beginning of the microbial and sequencing eras – people were facing big technological issues, but we eventually addressed them through perseverance. 

PF: Your current work involves significant multi-omics approaches in improving vaccine research and design – do you feel the pandemic has reignited interest in the field? Multi-omics approaches have long had some of their most significant impacts in the field of oncology, perhaps because of the high global burden of cancer and its involvement in bodily processes in all layers that multi-omics can engage with. Has the pandemic changed that balance? 

RB: The pandemic, and the impact of vaccines in tackling the pandemic, has certainly brought vaccines front and center on people’s minds, and also we’ve seen that the pandemic moved RNA-based vaccines from a promising future technology (one that was decades in the making) to a proven, commercially viable implementation. However, it still looks like further progress will be required for SARS-CoV-2 vaccines in the future, to deal with waning immunity and novel variants. Having a deep understanding of how various vaccines work – the quality and the duration of the immunity they generate, the severity of potential side effects, and the variants they work against – will remain critical for advancing the field. Multi-omic technologies have a large role to play in facilitating these advancements, specifically providing an understanding of the mechanistic way vaccines work at a molecular level – involving the entire adaptive and innate immune systems. Multi-omics can also provide data from different levels – tissue cultures, organs such as lymph nodes or injection sites, which can assist in teasing apart the exact effects of a vaccination. 

PF: The advent of AI and quantum computing will exponentially increase both our data generation and interpretation capacities. We already see deep learning as a possible answer in improving our multi-omics integration approaches – what kinds of progress do you expect to see as the adoption of these technologies increases throughout the field?

RB: I think it is very interesting to be active in this field; the opportunities that pioneering technologies provide us with allow us to re-examine our field as a whole as we find ways to adopt them. It will be exciting to see how quantum computing will develop in the life sciences. Currently it is much more developed in physics, which can provide the infrastructure at key expert centers; it will be interesting to see if its eventual adoption in the life sciences brings with it new needs for harmonization and standardization in our field. I am also very curious to see how the algorithms which will be run on quantum computers will be designed to assist in solving multi-omics problems. Formulating questions from multi-omics in architectures that quantum computers can deal with will certainly be an intriguing challenge to see being addressed. For machine learning, I think we already see increasing efforts on that front with integration approaches – it is trending in publications across a plethora of applications, and looks like it is very much here to stay. And indeed, there are other, more familiar challenges we can anticipate: we have a lot of data, though it may not all be as useful as we had hoped. We need to continue to find ways to capture the “spirit” of the data in more innovative and effective ways to increase its reusability and applicability across wider implementations, while also increasing its homogeneity to eliminate biases. 

PF: You will be attending our Bioinformatics Strategy Meeting in Boston, where you will be facilitating a discussion on the challenges that multi-omics approaches face with regards to data analysis. Are there any other insights you would like to share with us prior to the event? Are you looking forward to the resumption of physical conferences?

RB: I am absolutely looking forward to physical conferences happening again – and the unique setting of the Bioinformatics Strategy Meeting establishes the most personal setting for scientific discourse. I cannot wait to meet other people active in the field and learn from their experiences; hopefully, they will also learn a little bit from mine. Discussions are always more effective when the attendees bring their own insights and ideas to the table – and the Strategy Meetings are perfect for facilitating this. 

Tuğçe Freeborough-Gerrard, Producer, Proventa International

Nick Zoukas, Former Editor, PharmaFEATURES

Join Proventa International’s Bioinformatics Strategy Meeting in Boston to discuss the latest impact of novel data management, analysis and interpretation trends in multi-omics approaches, as well as the wider field of Bioinformatics. Meet and discuss cutting-edge topics with world-renowned experts and stakeholders to kickstart your next collaboration. 

Share this:

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settings