Early drug discovery in the pharmaceutical industry has driven the development of technology to create large libraries of compounds for screening of chemical targets. Innovations in DNA-encoded libraries have been critical in optimising drug discovery, demonstrating greater selectivity of hit molecules for chemical targets. The latest advancements have seen the integration of machine learning techniques with DNA-encoded libraries. 

DNA-encoded libraries in discovery chemistry

High-throughput screening (HTS) is the conventional method of screening compounds in academia and the pharmaceutical industry. One of the advantages of this method is the ability to screen > 100,000 compounds per day. Unfortunately, HTS is a costly process which is limited by the low hit rate as a result of false positives and incompatible libraries.

DNA-encoded libraries (DELs) represent a modern and versatile tool used to better identify a greater range of novel biological compounds. These libraries are capable of screening drug targets with an extensive number of compounds with great efficiency. DELs are an assembly of small molecules attached to DNA tags which carry unique information about the structure of each member within the library. Three main factors determine the quality of a DEL compound collection: “Variety of reliable chemistries that are DNA compatible, accessibility to a diverse and large set of BBs, and experience of the designer”.

When identifying biological targets, one of the main selection techniques used with DELs work is the affinity-based selection method:

1.     The molecular library is first incubated with the immobilised protein target.

2.     Bound ligands are then separated from unbound ligands by washing before they are eluted (removing an adsorbed substance by washing with a solvent).

3.     The ‘eluted binders’ are amplified by polymerase chain reactions (PCR) before undergoing DNA sequencing.

4.     PCR and DNA sequencing allow the identification of the structure of the binding compounds.

Despite the systematic steps, this selection method is not appropriate for the purification of protein complexes and membrane proteins. As result, the field of discovery chemistry has seen substantial development to optimise the screening process. 

Success of the latest modalities

The last few years has seen the application of DEL to identify chemical matter for challenging targets including g-protein coupled receptors (GPCRs). GPCRs are transmembrane proteins which specific ligands bind to – for example adrenaline to adrenoreceptors. They are a particularly challenging protein to purify in the context of DELs due to their low solubility and stability. An early report by GSK demonstrated success in overcoming some of these limitations in efforts to discover antagonists of a GPCR known as NK3. Instead of purifying the receptor, they instead expressed it in high concentrations in cells using viral vectors. Affinity selection was used directly onto the cells, and according to the report “several families of antagonists were discovered, with potencies down to single-digit nanomolar.” This represents a significant step forward in hit-to-lead technology regarding the application of DEL to previously ‘undruggable targets’. A 2021 study recently demonstrated the first successful screening of a multimillion membered DEL inside a living cell”

In vivo screening using DNA libraries is often limited by the size of normal body cells. The study overcame this issue with a novel approach of using oocytes, human egg cells, which are 100,000 times larger. The target protein was expressed in the oocytes which was fused with the prey. A prey in molecular biology is the fusing of the potential interacting protein with the activation domain. This process allowed specific DNA labelling and discrimination between the DEL bound to the target protein and the endogenous cell proteins. The results from this study represent a powerful modern approach to DEL screening. The main advantage of this method eliminates the need for extensive purification of a target protein.

Strategies to address current and future challenges

Despite the recent innovative changes to conventional DEL techniques, there remain constraints to the technology. Screening noise (a molecule in the hit group which turns out to be a poor ligand upon validation) has yet to be eliminated and is a common problem. Expansion upon conventional methods could also attempt to engage challenging targets like RNAs and multi-protein complexes. 

Selectivity, in terms of distinguishing one protein family from another, can be a pitfall for DELs. This has been a particular problem in developing inhibitors for enzymes including kinases. One solution is to screen against a single family member, to which the hits are resynthesised and tested against the others. Unfortunately, resynthesis and validation are both time-consuming and costly processes. One suggested strategy in a 2015 study was to “carry out parallel screens against several different family members, then identify selective ligands informatically by comparing the hit pools.” This method proved successful in identifying selective ligands for animal and human albumins (a protein made in the liver).

The integration of machine learning (ML) with DELs appears to be the next logical step. ML allows the identification of important features and obvious patterns from a small dataset and uses the information to create projections for larger datasets. Two examples of machine-learning approaches used are the random forest, and another based on a ‘graph convolutional neural network’ (GCNN). The random forest is an algorithm that creates a predictive model comprising a large number of individual ‘decision trees’. which operate as a whole group. Each tree in the random forest produces a class prediction and the class with the most votes becomes the model’s prediction. These methods have already demonstrated success in a study which reported that ML models verified hits up to 29% at one micromolar. The ability to identify target molecules on a micro scale is critical for creating a larger hit pool. In comparison, HTS typically produces a hit rate of 1%. This promising collaboration between computational and molecular biology shows the potential of creating DELs with a greater selectivity and quantitative output.

The constant evolving technology allows medicinal chemistry to improve methods like DELs to contribute to optimising drug discovery for the pharmaceutical industry. 

Charlotte Di Salvo, Lead Medical Writer
PharmaFeatures

Share this:

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settings