What Design Experts Don't Want You To Know: Evolutionary design of molecules based on deep learning and a genetic algorithm Scientific Reports

Table Of Content

Baseline methods
Prospective de novo drug design with deep interactome learning
SMILES-based models
Energy-based model
Generative molecular design in low data regimes

It considers QM calculations for simulating the ligands and vicinity of protein where it docks while uses MM for simulating the rest of protein structure, providing improved accuracy over classical MM/docking simulations. Performing QM simulation even only for ligands and protein vicinity is computationally very expensive compared to relatively quick docking simulations. To expedite, QM simulations for ligands/protein vicinity can be replaced with state-of-art ML-based predictive model that has recently achieved chemical accuracy in predicting several properties of small molecules.

Breaking boundaries in protein design with a new AI model that understands interactions with any… - Towards Data Science

Breaking boundaries in protein design with a new AI model that understands interactions with any….

Posted: Wed, 21 Jun 2023 07:00:00 GMT [source]

Baseline methods

Olivecrona et al. [101] trained a policy-based RL model for generating the bioactives against dopamine receptor type 2 and generated molecules with more than 95% active molecules. Furthermore, taking an example of the drug Celecoxib, they demonstrated that RL can generate a structure similar to Celecoxib even when no Celecoxib was included in the training set. De novo drug design has so far only focused on generating structures that satisfy one of the several required criteria when used as a drug. Stahl et al. [102] proposed a fragment-based RL approach employing an actor-critic model for generating more than 90% valid molecules while optimizing multiple properties.

Prospective de novo drug design with deep interactome learning

The molecular graph is usually represented by features at the atomic level, bond level, and global state, which represents the key properties. Each of these features are iteratively updated during the representation learning phase, which are subsequently used for the predictive part of model. Validated their correctness, analyzed their performance, and supervised the work. Discussed the results, wrote, and reviewed the paper’s contents and Supplementary information.

SMILES-based models

The closed-loop evolutionary workflow guided by deep learning automatically and effectively derived target molecules and discovered rational design paths by elucidating the relationship between the structural features and their effect on the molecular properties. Furthermore, owing to the inherent nature of data-driven methodologies, the molecular design performance can be influenced by the characteristics of the training data. Therefore, the training data should be prepared carefully according to the design purpose and situation. Unlike the test cases used illustratively in this study, the data to train the RNN and DNN need not be the same and could perhaps be configured differently depending on the design target.

Popova et al. [100] recently used deep-RL for the de novo design of molecules with desired hydrophobicity or inhibitory activity against Janus protein kinase 2. They trained a generative and a predictive model separately first and then trained both together using an RL approach by biasing the model for generating molecules with desired properties. In RL, an agent, which is a neural network, takes actions to maximize the desired outcome by exploring the chemical space and taking actions based on the reward, penalties, and policies setup to maximize the desired outcome.

Energy-based model

Interestingly, the use of constraints does not significantly affect the rate at which S1 is decreased. Under the constraints, LUMO is still assigned a maximum value of 0.0 eV, as delineated in Fig. Moreover, although the maximum HOMO is limited to − 5.0 eV, the distributions of the training data in Fig. S2 and indicate that the constraints allow sufficient room in which to decrease the energy gap between HOMO and LUMO. The constraints in the form of the HOMO and LUMO energies thus have the opposite effect on the increasing and decreasing S1 energy. Average rates of change of S1 for the 50 seed molecules during the evolutionary design involving increasing and decreasing S1.

The averages in the second and third phases are 2.22 and 2.31, with variances of 1.38 and 1.36, respectively. Although the average value rises slightly as the number of phases increases, the variance gradually decreases. This intensifies the creation of molecular structures with the desired physical properties. The effectiveness of the deep learning-based evolutionary design was verified by applying it to real-world problems. The aim was to change the maximum light-absorbing wavelengths in terms of the S1 energy.

Recently, Lim et al. [128] used a distance-aware GNN that incorporates 3D coordinates of both ligands and protein structures to study PLI outperforming existing models for pose prediction. This is important for accurately predicting the desired PLI interactions and biophysical parameters while designing high throughput novel molecules. It will contribute to efficiently narrow down the candidates during lead optimization, which ultimately will be subjected to further experimental characterization before it can be used for pre-clinical studies. To achieve the long overdue goal of exploring a large chemical space, accelerated molecular design, and generation of molecules with desired properties, inverse design is unavoidable. It is generally known that a molecule should have specific functionalities for it to be an effective therapeutic candidate against a particular disease, but in many cases, new molecules that host such functionalities are not easily known with a direct approach. Furthermore, the pool where such molecules may exist is astronomically large [81,82,83] (approx. 1060 molecules), making it impossible to explore each of them by quantum mechanics-based simulations or experiments.

Generative molecular design in low data regimes

In this process, chemists select a target (“lead”) molecule with known potential to interact with a specific biological target, then tweak its chemical properties for higher potency and other factors. The researchers next aim to test the model on more properties, beyond solubility, which are more therapeutically relevant. “Pharmaceutical companies are more interested in properties that fight against biological targets, but they have less data on those.

The model basically takes as input molecular structure data and directly creates molecular graphs — detailed representations of a molecular structure, with nodes representing atoms and edges representing bonds. It breaks those graphs down into smaller clusters of valid functional groups that it uses as “building blocks” that help it more accurately reconstruct and better modify molecules. The KDE plots indicate the density of partition coefficient (LogP) and quantitative estimation of drug-likeness (QED) for the a molecules in the training set and b the generated molecules. The distribution of synthetic accessibility scores (SAS) for the generated molecules is visualized with violin plots for target conditions on c QED and d LogP, respectively. Examples of generated molecular structures conditioned upon restrictions on molecular properties of e QED and f LogP are also provided. (a) Molecular representations and their relationships with the encoding, decoding, and property prediction functions.

CRBM is a nonlinear generative model that can capture the conditional probability of observed data and has been previously applied for time series generation55. Although energy-based models are typically used for generative modeling, they can also be used for classification tasks56. Owing to their rich expressivity of latent variables and modeling flexibility, we use a conditional energy-based model for the supervised learning task of molecular property prediction. We use compounds from the Zinc database39 to train and validate the performance of the proposed methods. A subset of Zinc comprising 12,000 molecules that are commonly used for benchmarking purposes is collected for our computational study40. The collected SMILES identifiers of the molecules are converted to graph-structured data by identifying the node features and edge features using the RDKit package38.

When proceeding with the next phase, we selected the 30 molecules with the lowest S1 values as the seed, including the new molecules created in the previous phase. This yielded an average S1 value for the training data of 4.91 eV, and the variance is 2.11. However, the aforementioned process produces new molecules for which the S1 distribution is relatively lower than that of the training data, as shown in Fig. The average S1 of the molecules produced in the first phase is 2.20, and the variance is 1.40.

What Design Experts Don't Want You To Know

Monday, April 29, 2024

Evolutionary design of molecules based on deep learning and a genetic algorithm Scientific Reports

Breaking boundaries in protein design with a new AI model that understands interactions with any… - Towards Data Science

Baseline methods

Prospective de novo drug design with deep interactome learning

SMILES-based models

Energy-based model

Generative molecular design in low data regimes

No comments:

Post a Comment

Modern Mediterranean Interior Trend: Minimalism Rethought