Building the Map of Life, our single source of healthcare R&D data powered by data42

Sam Khalil
6 min readJun 10, 2021

Accessible, searchable and connected R&D data in one single location

This is an exciting moment for our deeply committed teams as they have now laid the foundations that will allow us to start a new era of data-empowered R&D innovation.

Over the last decade, Biopharma’s R&D model has had to undergo a radical shift in order to ensure successful outcome of the lengthy and costly drug development process. The goal to bring new innovative treatment options to underserved patient populations while adapting to new technological innovations calls for an urgent need to optimize our R&D processes. Most importantly, this carries with it the hope to address some of society’s most challenging healthcare issues.

Advances in available data science technologies such as Machine learning and Artificial intelligence promised advanced healthcare breakthroughs, but the needed clinical data (all values recorded within a clinical trial) to be analyzed was collected and stored in unusable form for most of the proposed solutions.

Historically, data from clinical trials were collected and analyzed to answer predetermined questions. The data were collected to serve a single purpose, and subsequent use of the data was limited due to storage in different databases and platforms. Years of uncatalogued and unconnected data were fragmented across our organization and with external partners. This made exploratory analysis and scientific decision-making in our research and development difficult, time-consuming, and expensive — so much so that, in numerous cases, the work was not feasible and answers to questions remained out of reach.

Additionally, access to the data itself was complex and challenging due to it being locked in silos and across unconnected data modalities. Data collected over decades therefore needed to be centralized and, more importantly, harmonized, to allow for any meaningful big data analysis to be conducted and interesting insights to be generated.

Therefore, two years ago we started to build a connected data environment that would not only allow the discoverability of our Research and Development data, but more importantly be the basis of new insights generation. Dedicated efforts from a diverse range of committed intrapreneurs, visionary stakeholders and passionate scientists at Novartis were necessary to make a previously ‘bold vision’ into a reality.

With our data harmonized, validated, linked, and anonymized… the burden of curation and verification is taken off the shoulders of researchers and analysts, so that they are free to focus their time on investigating the scientific questions of interest.

You can learn more about data42 and our journey in this article from Achim Plueckebaum, and in upcoming articles our teams will highlight the enormous efforts that have been made to ingest, clean, unify, harmonize to common standards and most importantly, anonymize over 25 years worth of Novartis R&D data.

Once this was achieved, we embarked on building the tools that will allow anyone in the organization to easily discover the data they need with an easy governance, no silos and in a usable format.

Our ‘Map of Life’ product was finally born, with a clear ethos:


The Map of Life surfaces more than two decades of research and development data harbouring over 2800 trials across over 500 different indications, allowing access to deeper analysis and insight generation in an unprecedented way.

Map of Life — Powered by data42

EXPLORE: Look across data and make new connections

As one of the aims of data42 is the democratization of data access our first step was to remove the barriers and frictions that traditionally separated bench from clinical researchers, potentially enhancing the pace of innovation. We set out to create a unique platform that allows Novartis scientific researchers and drug developers to come up with new hypotheses, to explore, collaborate, and make data-driven decisions based on a holistic data pool with ease and speed.

Data scientists tend to spend 80% of their time finding the data needed and then getting it ready for analysis, and only 20% of their time actually analyzing it. They now have access at their fingertips to one of the largest FAIR-ified, analysis-ready data in the pharmaceutical industry.

For the first time clinical trial data and omics data are also within the same environment. And, because a shared language is being applied to all data (so that all variables such as weight, temperature, time, are described in a consistent way), we are now able to see patient stories at a population level.

Intuitive graphical tools enable users to search, discover and explore clinical trial and experimental data available on the platform. Users can then select and pool patients or trials of interest for further analysis.

In addition, without leaving the platform our teams of researchers now have access to all the required tools to run complex and cross-functional analyses in a collaborative environment.

Here are some examples of questions our teams can now explore in seconds vs months before:

Example of a question our teams can now rapidly answer by searching the data in the Map of Life.

COLLABORATE: Work together in a shared and secure environment

We strongly believe that the diversity in background, goals, and approaches between users will provide fertile ground for collaboration and innovation.

Our teams have now access to powerful analytical tools in a collaborative and highly secure environment. Researchers can work together, find the data they need, share them easily and perform advanced analysis work, all within the data42 environment. This provides the foundation for enhancing and accelerating the drug development process, and potentially find answers to fundamental scientific and medical questions that could have a profound impact on patients’ wellbeing.

The Map of Life is part of a transformative movement working at the intersection of biology and data science to unlock the potential of our people and our data to transform medicine. Once this potential is unlocked and in combination with the entrepreneurial mindset of our teams, we will enable previously unfeasible questions to be asked and uncover answers that were previously beyond our reach.

Our vision is to create a data-powered global research and development community, able to innovate at scale to get better medicines to more patients, faster.

Photo by Octavian Rosca on Unsplash

INNOVATE: Answer questions you’ve never thought possible before

Some of the many ways data42 and the Map of Life will allow us to accelerate the R&D pipeline and bring treatments faster to patients include:

  • Optimization of our clinical trials: by learning from past experience across disease areas, we can inform the design of future trials.
  • Synthetic and external control arms: Statistical modelling approaches for rapid evaluation of comparative effectiveness
  • Virtual Proof of Concept studies: Identify potential connections between existing drugs and other disease states by interconnecting patient data
  • Genomic data mining: AI approaches can be used to identify underlying genetic characteristics associated with disease or treatment response, including polygenic risk scores for conditions such as cardiovascular disease and diabetes
  • And many other ideas to be unleashed…


The Map of Life is the first step of our journey into unchartered territories that will help to identify and address patients’ unmet needs by ensuring that existing data assets are harnessed to its fullest extent.

data42 holds the promise to bring the impossible into the realm of possibilities — connecting all existing data sources from bench to bedside in a truly collaborative manner and allowing for the maximal usage of current and importantly, future, data science technologies.

What’s next?

Stay tuned, as our teams will tell more stories about some of the innovations being made possible through our data42 platform.


To all our data42 team members and our collaborators that have worked tirelessly to make this dream happen, a huge thank you !! And to my team in particular, I couldn’t be more proud of what you achieved… Merci!



Sam Khalil

Sam Khalil is data42 Head of Science at Novartis — Research & Development (R&D)