Science In Silico: How Drug Discovery Uses High Performance Computing
21st July 2024 - Scott Marshall and Andrew Holway
Historically, the discovery and preclinical phases have taken between 5 and 7 years. However, an article released in Nature in 2022 showed evidence that companies who were using HPC and AI-enabled technologies were completing this journey in less than 4 years.
High performance computing (HPC) has a long history in helping researchers unearth groundbreaking scientific discoveries, and as innovative new digital technologies continue to be embraced by the scientific community, HPC Is becoming an increasingly important tool in the researcher’s arsenal.
Within a scientific context, high performance computing generally refers to processing complex calculations at high load on commodity hardware. The scope of HPC systems can vary depending on the exact task(s) being performed, and whilst it may be easy to jump to thoughts of supercomputers and large clusters running massive scale simulations, a single machine utilising close to all its CPU resources to complete a task could still be considered HPC. A good way to define high performance computing then, is the maximising of available computing resources in the running of computationally intensive tasks.
The Biotech Sector
In this series of articles, we discuss how HPC is being used by the biotechnology industry, and try to determine the true value of in silico research. First off, we take a look at 10 UK based drug discovery companies, all of whom are using computationally complex scientific methods in their goal of creating brand new treatments.
As an industry, biotechnology has long embraced HPC’s potential as a research enhancing tool, tracing back to the 1970s with Martin Karplus’ Nobel Prize winning work developing molecular dynamics simulations.
The sector's history is littered with other success stories showing the utility of HPC, including the use of virtual screening and molecular docking simulations in the creation of the COVID-19 vaccines, increasingly powerful HPC architectures helping to reduce the cost of DNA sequencing by a factor of 10, and the recent news that a cancer drug discovered by a supercomputer was approved for clinical trials in June 2024.
The Companies
Formed in 2013, the London based Benevolent is one of the more high profile companies in the cohort. Leveraging its AI-based proprietary drug discovery platform, the company currently has a potential treatment for ulcerative colitis in clinical trials, as well as a number of other molecules in their pipeline, including some in partnership with pharma giant AstraZenaca.
Despite only being founded in 2021, Charm Therapeutics has proven highly successful at securing high value venture funding, with the most recent investment raising around £16 million from NVIDIA. They are using their in house protein folding algorithms to identify potential molecules to develop through their drug discovery pipeline.
The smallest company in the cohort in terms of funding, Evariste has an impressive pipeline of potential cancer treatments in progress, despite still being in the pre-seed phase.They recently joined the NVIDIA Inception Program, and will be hoping to use this support to further develop their Frobenius drug discovery platform.
Now based in Cambridge, Exscientia initially spun out of the University of Dundee, an institution with an excellent global reputation for the life sciences. Touting itself as the creator of the world’s first AI-designed molecule to enter clinical trials, this treatment proved unsuccessful in these trials, as is the case with about 90% of all potential medicines. They had a highly successful IPO in 2021, valuing the company at around £2 billion.
The focus of HealX’s work is finding new treatments for rare diseases that are typically overlooked by the medical community. They have found success using AI-enhanced discovery techniques to supplement traditional methods, and currently have at least one potential therapeutic working its way through clinical trials.
Currently still in the seed phase, Kuano is taking a novel approach to drug discovery by using machine learning and quantum computing techniques to study enzyme reactions to find potential new treatments.
London based Multiomic Health is seeking to find potential new precision medicine treatments for Metabolic-syndrome related diseases such as diabetes. As the name suggests, they take a multi-omic approach, looking at a range of different branches of science to help produce their drugs.
Pharmenable Therapeutics is a spin out of Cambridge University, basing their work on research initially conducted at the famous institution. They offer a small molecule discovery platform informed by AI, and have partnered with larger therapeutics companies like Denali to help drive their project forward.
Another Cambridge Uni spinout, Phoremost has been developing their proprietary protein degradation platform since 2014. Using this technology, along with other in house molecular simulation systems, they are pushing a number of potential cancer treatments through their pipeline.
Founded in London, in 2019, Relation Therapeutics aims to combine machine learning with traditional biological expertise to develop new drugs. They recently completed their seed funding round, raising over £50 million, and have previously leveraged access to the CAMBRIDGE-1 supercomputer to develop treatments for osteoporosis.
These 10 companies are at a range of stages in their development and funding lifecycles, but it is no surprise to see them located around the traditional biotech power bases of Oxford, Cambridge and London, where highly renowned Universities are stationed.
Their funding comes from a broad group of different investors, however there are a couple of common denominators. Dr. Jonathan Milner, a renowned life sciences entrepreneur, has invested in 5 of the 10 businesses, and NVentures, the investment arm of NVIDIA, has also backed a number of the companies.
The Pharmaceutical Process
The lifecycle of taking a new drug to market can, generally speaking, be split into four main phases: discovery, pre-clinical research, clinical trials and market approval.
When we look at these different phases, it appears that the need for high performance computing resources is greatest in the initial discovery phase. Despite examples of HPC use throughout the drug development process, such as in the organisation and analysis of the massive amount of data clinical trials produce, the bulk of HPC use is usually front loaded.
At the very start of the drug discovery process, researchers need to wade through a massive pool of data to begin to identify molecules which may have the potential to become a new therapeutic drug in the future.
Our cohort of companies use artificial intelligence and machine learning techniques, which are extremely compute heavy, to analyse and organise all the available data sources. To do this, they will generally create a centralised knowledge graph, a repository of information where links and interactions between data points can be mapped.
Once the data has been pooled and a potential molecule has been identified, computational chemists and bioinformaticians will then move onto conducting in silico validation and optimisation research. Computationally complex techniques such as molecular dynamics simulations are used to assess how the chosen molecule interacts with thousands of different target compounds, further building out the knowledge graph.
Once this virtual screening process has been completed, and a molecule successfully chosen for progression through the pipeline, it appears that the compute resources needed, and therefore HPC spend, drop fairly quickly. In the pre-clinical phase, computational modelling and in silico simulations are replaced with the need for human interventions. A combination of in vitro (test tube) and in vivo (animal) laboratory techniques are used to validate a treatment’s safety and efficacy. Finally, if these tests prove successful, the new drug can be put forward for clinical trials.
Historically, the discovery and preclinical phases have taken between 5 and 7 years. However, an article released in Nature in 2022 showed evidence that companies, much like the ones identified in this article, who were using HPC and AI-enabled technologies, were completing this journey in less than 4 years. This is a massive improvement, and the process should only shorten as new technologies evolve and organisations optimise their HPC resources.
Research and Development and HPC spend
There is a general consensus that between 5-15% of an established pharmaceutical company’s annual R&D spending is on computational resources. When it comes to startups and SMEs though, especially ones using in silico discovery methods, this percentage is likely much higher.
From their annual budgets, we know that in 2023, Exscientia spent £128 million on R&D, and Benevolent £60 million, whilst HealX spent around £19 million in 2022. Even if we take an overly conservative estimate of how much was diverted towards computing, it seems clear that companies are regularly spending an appreciable portion of their total budgets on these resources.
There is an argument to be made that drug discovery startups are pure R&D companies, and as they haven’t yet made it to manufacturing or distributing a treatment 100% of their total budget could be considered R&D costs.
The amount of these resources being allocated to HPC can depend on a number of things, including the size and maturity of the organisation, how reliant on computationally complex techniques their drug discovery methodology is, and at what point in the drug discovery pipeline a treatment is at.
Generally, breakdowns of R&D budgets are closely guarded, but we do know from a discussion with HealX that they are currently spending around £350,000 a year on HPC resources from Amazon Web Services. Interestingly, they were still spending that amount, even though none of the molecules in their pipeline were at a computationally intense period in their lifecycle.
The Cost of Inefficient HPC
The widespread integration of computationally intensive technologies brings with it an inevitable rise in a biotech company’s need for computational resources, but setting up and maintaining the necessary high performance computing infrastructure can be risky and expensive.
HPC professionals with the required level of specific technical skills and experience do not come cheap, and the growing understanding of the necessity of HPC has led to companies across a wide variety of sectors hunting for the relevant expertise in an ever shrinking talent pool.
Not only are such high level HPC experts in short supply, but they are becoming increasingly expensive to employ. The average wage of an HPC Infrastructure Architect in the UK private sector in 2020 was approximately £68,000. In 2024, this had shot up by nearly 25% to £84,000.
Add into this bargain that HPC employee attrition costs are known to be very high. These are highly specialised roles, and for companies with only one or two of these positions, the cost of losing a HPC specialist can be many times their individual salary. Finding a replacement in a tight market can be very difficult, and the losses to productivity can end up being astronomical.
This scarcity of HPC expertise, and the rising costs that come with hiring, has led to some companies attempting to plug the knowledge gaps in house by placing HPC infrastructure in the hands of team members who don’t have the skillset or experience to manage them effectively, inevitably leading to inefficient, risk heavy setups.
Additionally, due to the extremely sensitive nature of medical data, all research needs to be in full compliance with regulatory guidelines. Horror stories abound of unauthorised access to patient data being granted due to poor HPC infrastructure management, and breaches like this could result in costly fines and lawsuits.
Darwinist offers a solution to these potential pitfalls, leveraging our in house HPC expertise and technology to ensure that a company’s high performance computing workflows are efficient and compliant.