The pipeline from drug discovery to development to approval is a complex and lengthy process. However ML is beginning to show innovations in all stages of drug development. Target validation, identification of prognostic biomarkers and analysis of digital pathology data in clinical trials are some of the opportunities in which ML can be implemented.
There are two main techniques used to apply ML: supervised and unsupervised learning. Unsupervised learning is a type of algorithm that learns patterns from untagged data. Supervised learning on the other hand, is a type of algorithm formed from labeled training data which consists of a set of training examples.
Supervised learning methods have been used to predict future values of data categories or continuous variables. Unsupervised learning is primarily used for exploratory purposes in the development of models to enable data clustering in a format not specified by the user. This particular technique helps to identify hidden patterns within input data, whereas supervised learning methods predict future outputs based on a trained model of known input and output data. According to a 2020 review, supervised learning techniques such as Support Vector Machines, deep learning and regression methods have already been applied to biomedical challenges in the last decade.
Target identification and validation
The identification and validation of a therapeutic target requires the analysis of vast datasets. Genetic screening and high-content imaging are examples of techniques that produce large datasets that can be exploited for early target identification and validation. However analysis of such data requires appropriate mathematical methods to construct valid statistical models – this is where ML can be exploited.
As early as 2010, ML was applied in a study for target validation in the form of a “decision tree-based meta classifier”. In this study, the ML platform was proposed as a computational approach to predicting morbid and druggable genes. Morbid genes with mutations are associated with causing hereditary human disease. The tree-based meta-classifier was used to predict targets on a genome-wide scale. It managed to correctly recover “65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%”. The ability of ML to reliably predict specific genes on a genome-wide scale is a huge step forward in further optimising target identification. Prediction of therapeutic targets saves time and resources for pharma companies and potentially utilise the mathematical approach to predict more reliable targets.
The Generative Adversarial Network (GAN) is an example of a recent innovation in deep learning for drug discovery. Deep learning is a specialised area of ML that attempts to model abstraction from large-scale data using multi-layered deep neural networks (DNNs). Abstraction is a computer science term that refers to the process of filtering out irrelevant data in order to focus on the desired information.
As an unsupervised ML method, GAN has proven to address the challenges of supervised ML, primarily the training of large data sets which is often expensive and time-consuming. In a 2017 study, GAN-based frameworks were used to develop and identify novel compounds for anticancer therapy with chemical and biological datasets.
This study emphasised how the productivity of pharmaceutical research is limited by inefficient early lead discovery processes. It also highlighted how in silico-based approaches like deep learning models can generate reliable data at a reduced cost and time scale relative to current screening methods.
In research, a pathologist interprets the presentation of tissue/cells within a glass slide. The spatial context between cells, size and general cellular structure can be indicators of changes with drug interaction. Computational pathology is becoming an important part of drug development. It has been suggested that this method could allow pharmaceutical companies to discover novel biomarkers and generate them in a more precise, reproducible and high-throughput manner.
ML allows for high-throughput generation of features for thousands of cells, which is an impossible task for pathologists. Immuno-oncology is a particular therapeutic area which has benefitted from using computational pathology. A 2017 study found that computational analysis of tumour-adjacent benign tissue in prostate cancer revealed information typically ignored by pathologists but has been associated with progression-free survival.
One of the main concerns with ML predictions is overfitting or underfitting. Overfitting is described as a model which consists of “lower quality information/technique but generates higher quality performance. In contrast, underfitting models fail to recognize the data sets’ underlying trend and generalize the new data inputted”. Both errors produce inaccurate results which compromise the reliability of predicted drug targets. Increasing the sample size and cross-validation are often used to address these problems. Cross validation is a technique that uses independent data sets to estimate the accuracy of ML algorithms’ models.
Another challenge for the pharmaceutical industry is the lack of personnel to operate AI/ML-based platforms. Furthermore, there is often skepticism about the quality of data generated by AI. Small organisations are often limited in their budget so cannot afford to invest in AI/ML technology.
Despite the improvements needed to refine ML applications, the potential they bring to drug development is significant. In addition to reducing human error, the automation of ML software can analyse data from many sources more accurately and in a shorter period of time. The advancement of AI and ML will continue to reduce the challenges faced by the pharmaceutical industry.
Charlotte Di Salvo, Lead Medical Writer
PharmaFeatures
Metabolomic profiling uncovers disrupted biochemical pathways in diseases like COPD and aging, highlighting shared and unique mechanisms.
Emerging evidence positions ion channels, specifically voltage-gated sodium channels (VGSCs), as crucial players in cancer progression.
Despite advances, key gaps in understanding insulin resistance persist, including CNS diagnostics, brain-periphery interactions, and apoE isoform roles, highlighting critical research priorities for new treatments.
GAS1’s discovery represents a beacon of hope in the fight against metastatic disease.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settings