AI for Drug Discovery

In the 6th ‘AI for Good’ seminar of Trinity Term, Charlotte Deane gave a thought-provoking talk on ‘AI for Drug Discovery’, yet another aspect where artificial intelligence or machine learning (AI/ML) can improve the world. Charlotte is a Professor of Structural Bioinformatics who leads the Oxford Protein Informatics Group at the Department of Statistics, and Chief Scientist of Biologics AI at Exscinetia (a pharmatech company based in the Oxford Science Park)..

As someone who has very little knowledge about drug discovery, I came to the talk worrying that I won’t understand a thing about this fancy research topic. Luckily, Charlotte gave a succinct introduction on the basic concepts to bring the non-expert audience on board. We could begin to appreciate the enormous potential (and limitations) of how can AI/ML facilitate drug discovery, or even alter the conventional drug discovery dynamic, which has been biased towards high-income populations who can afford more expensive, novel drugs.

A drug is “a medicine or other substance which has a physiological effect when injected or otherwise introduced into the body”. Behind this simple definition, however, lies a long (12-14 years) and costly ($2.5 billion in R & D costs per approved drug) drug discovery pipeline involving multiple stages of pre-clinical and clinical research. Computational methods have long been used to reduce the time and financial costs of all stages of drug discovery, and AI adds novel, powerful tools that advance these methods further in this big data era.

The very first stage of drug discovery, ‘Target Discovery’, is perhaps one of the most challenging ones. At this stage, scientists need to identify the correct target (e.g. a certain position of a protein molecule) where some of the 10^60 synthetically accessible drug-like molecules may physically fit in and induce the correct physiological effects. Charlotte showed a picture of SARS-CoV2 replicating among human cells as a ‘simple case study’ (since the virus’ genome and thus proteins are much simpler than humans). There were thousands of different components on the picture, and the correct target was this one tiny dot on the picture - essentially finding a needle in a hay stack. In order to identify the right target, scientists need to process tonnes of information regarding, for example, the function of every single protein involved and how to model the accurate structure of the proteins for computational analysis.

Similarly difficult is the second stage, ‘Target Validation’. This is Charlotte’s main research area, which is about how to design a molecule that would theoretically survive the digestive tract and act as a drug on the specific target with specific effects, while having minimal side effects. Researchers have developed AI-based platform that integrate, essentially, all existing relevant knowledge and information available from published literature, commercial data, known information about different proteins, electronic health records etc., and human experts can then make sense of the AI’s output and derive a ‘target product profile’ (TPP) that characterise the key criteria for a chemical to be considered as a potential candidate.

The TPP can then be encoded into a AI/ML algorithm (or several different algorithms) that screen through all sort of proprietary or publicly available data to filter out a population of molecules that meet the optimised sets of criteria  – extremely tedious work that is greatly expedited by AI/ML approaches. Up to this point, everything is still theoretical and there are still far too many potential molecules to test in actual lab experiments, but once again, AI/ML is here to help. Scientists develop an active learning cycle, where they select 10-20 prime molecules for synthesis and lab testing and feedback the test results to the active learning algorithms that improve upon each cycle of lab-based validation. Soon enough, they are able to zoom in to a promising subset of candidates.

Towards the end of the talk, Charlotte shared a thought-provoking story of her research, when her team encountered a classic ML problem. They used deep-learning image recognition approaches to screen for binding molecules for certain proteins. The theoretical chemical structure of potential molecules were converted into mathematical ‘pictures’ and pass into their AI algorithm, which was trained and cross-validated in various datasets. At first, the algorithm showed substantial improvement over conventional methods, but once it was tested using different datasets, the performance dropped significantly with no clear reason, because the parametrisation of the algorithm is largely a ‘black box’ like many other ML solutions. They figured out the problem

 after careful scrutiny of the data and algorithm, but this is an important reminder of how AI/ML are special tools that do not necessarily ‘follow your lead’, and one must utilise these tools with abundant knowledge about the underlying (research) questions and always carefully scrutinise the algorithm and its output.

On that note, Charlotte concluded her talk highlighting not only the massive potential of AI for drug discovery, but also the prevailing challenges in the field, from data availability, to bias in the existing data, and to the plausible (mis-)use of AI to swiftly discover bioweapons. Despite the promise and excitement from AI for drug discovery, it is a tool to be used by human experts in meticulously designed studies, and we are still very far away from having AI algorithms designing drugs for us.


Peter Ka Hung Chan is a Research Fellow at Reuben College, and an Oxford British Heart Foundation Centre of Research Excellence Intermediate Transition Research Fellow in the Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU).