Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning

Literature analysis

The systematic analysis of chemical transformations (SACT) of the data retrieved from literature consisted of four steps: (1) literature search, (2) literature data curation and evaluation, (3) methodology extraction and (4) reaction data curation and analysis. All details of the literature analysis are provided in Supplementary Section 2. The literature analysis identified 38 publications describing relevant borylation methods, from which the reaction data were manually extracted to obtain a high-quality dataset containing 1,301 chemical transformations. Meta-analysis of these data provided a foundation for an informed plate design.

LSF informer library

The concept of chemical informer libraries, initially reported by Merck48,61, served as the basis for developing the LSF informer library. Applying a clustering method based on structural features to a dataset containing 1,174 approved small-molecule drugs yielded eight structurally diverse groups of molecules. Details of the applied clustering and visualization of the cluster via principal component analysis are provided in Supplementary Section 3. Three molecules were selected from each cluster based on their distance from the cluster centre, price and availability and were subjected to borylation screening. To complement the model with fragments relevant to Roche’s chemical space, the top 100 most popular ring assemblies found in the Roche corporate compound collection were identified. For these ring assemblies, substructure searches were performed for the entire database. The resulting compounds were retained if (1) the structures had a molecular weight below 300 g mol−1 or fewer than 20 non-hydrogen atoms, (2) there was at least 1 g of powder stock available and (3) the structures were not used in any internal project or subject to legal restrictions. Out of this pool of candidates, 12 fragments were manually selected. Further details on the determination and constitution of the LSF informer library are described in Supplementary Section 3.

Screening plate design

Following the SACT approach that delivered a curated high-quality literature data set, a meta-analysis was conducted to define a clear rationale for determining the conditions for the 24-well borylation screening plate used for the LSF informer library. This analysis included the temperature (T), time (t), reaction concentration (c) and scale (n), selected based on the median values for our screening plate (T = 80 °C, t = 16 h, c = 0.2 M, n = 100 mmol). Subsequently, the number of reaction components generally used for borylation reactions (catalyst, ligand, boron source and solvent) was determined. Owing to the limited space on the 24-well plate and the high occurrence of [Ir(COD)(OMe)]2 (2), 2 was chosen as the catalyst. Analysis of the reagents used in combination with 2 provided the rationale for choosing B2Pin2 (3) as the boron source. This selection made it possible to screen a set of six ligands and four solvents. Six rather than four ligands were used because the dataset showed a greater variety of ligands than solvents. The ligands were assessed based on the chemical diversity of the converted starting materials and their commercial availability. Based on these results, six ligands from four chemical classes were selected. While the meta-analysis revealed that low-boiling solvents are the predominant solvents for borylation, their corresponding higher-boiling analogues (for example, Me-THF instead of tetrahydrofuran, THF) were selected to avoid potential solvent evaporation at 80 °C and reduce the risk of cross-contamination. The detailed meta-analysis results leading to the final plate design are described in Supplementary Section 4.

HTE borylation screening

Using a 24-well plate design (Fig. 3), all drug molecules from the LSF informer library and selected fragments (Supplementary Section 3 and Supplementary Figs. 3–5) were screened. The reaction set-up (automated solid dosing and solvent addition) and execution (heating and stirring) in glass vials on a parallel screening plate were conducted in a glove box under a nitrogen atmosphere. Upon completion of the reaction, the solvents were removed through evaporation, followed by automated resuspension of the residues in MeCN/H2O and dilution to a defined concentration for LCMS analysis using a liquid handler. The samples were then analysed by LCMS, and the resultant data were subjected to an automated reaction data analysis pipeline (Supplementary Figure 6) to rapidly determine all components within the mixture. Standardized reaction data output (SURF; Supplementary Section 7) allowed direct visualization of reaction outcome with the TIBCO Spotfire software as well as the direct loading into machine learning models. The general screening procedure, including detailed information on the hardware and software used, is provided in Supplementary Sections 5 and 6).

Scaled-up reactions

Selected molecules (three drugs, 1, 25 and 29; and four fragments, 37, 38, 39 and 45) showing substantial conversion to the respective borylation products were scaled up using the most promising conditions. All reactions were conducted under a nitrogen atmosphere in a glove box using glass reaction vessels with pressure release caps and standard stirring bars. Purification was performed using flash chromatography or reversed-phase high-pressure liquid chromatography. In selected cases, where separation of the borylated species could not be achieved, the boronic ester was transformed into a hydroxyl group. Structural elucidation was performed using NMR and HRMS. The full analytical results and spectra for all compounds are shown in Supplementary Sections 11 and 12.

Deep learningGraph neural network architecture

The following paragraphs describe the neural network architecture of the three introduced GNNs (that is, GNN, GTNN and aGNN). GNN and GTNN were trained to learn the two reaction properties (that is, binary reaction outcome and reaction yield), and aGNN was trained to learn regioselectivity. Details about dataset splitting are in Supplementary Section 1.

Molecular graph. For each of the three GNNs (that is, GNN, GTNN and aGNN), four different input molecular graph representations were investigated, which include steric (3D) and electronic (QM) features in different combinations, yielding four different molecular graphs: 2D, 2DQM, 3D and 3DQM.

E(3)-invariant message passing. The atomic features and optionally DFT-level partial charges were embedded and transformed using a MLP, resulting in atomic features $}}}}_^$. E(3)-invariant message passing in a similar fashion as suggested by Satorras et al.62 was applied to l layers over all atomic representations $}}}}_^$ and their edges. Edges were defined by covalent bonds for the 2D graph and all atoms within a radius of 4 Å for the 3D graph, respectively. All networks contained three message-passing layers. In each message-passing layer, the atomic representations were transformed via equation (1)

$$}}}}_^=\phi \left(}}}}_^,\mathop\limits_}}}(i)}\psi \left(}}}}_^,}}}}_^\right)\right),$$

(1)

for 2D graph structures, and equation (2)

$$}}}}_^=\phi \left(}}}}_^,\mathop\limits_}}}(i)}\psi \left(}}}}_^,}}}}_^,}}}}_,\right)\right),$$

(2)

for 3D graph structures.

In equations (1) and (2), $}}}}_^$ is the atomic representation h of the ith atom at the lth layer; $j\in }}}(i)$ is the set of neighbouring nodes connected via edges; ri,j the interatomic distance features (Methods, “Atom featurization” for details); ψ is a MLP transforming node features into massage features mij as $}}}}_=\psi (}}}}_^,}}}}_^,}}}}_)$ for 3D graphs and $}}}}_=\psi (}}}}_^,}}}}_^)$ for 2D graphs; ∑ denotes the permutation-invariant pooling operator (that is, sum) transforming mij into mi as $}}}}_=_}}}(i)}}}}}_$; and ϕ is a MLP transforming $}}}}_^$ and mi into $}}}}_^$. The atomic features from all layers $[}}}}_^,}}}}_^,}}}}_^]$ were concatenated and transformed via a MLP, resulting in final atomic features H. H was then transformed differently by the three GNNs, using sum pooling (GNN) or multi-head attention-based pooling (GTNN) to obtain molecular outputs (that is, reaction yield and binary reaction outcome), or no pooling (aGNN) for regioselectivity prediction.

GNN. Atom features H were pooled via sum pooling, transformed via an additional MLP, concatenated to a learned representation of the reaction conditions (Methods, “Condition featurization” for details) and transformed to the desired output via a final MLP.

GTNN. A graph multiset transformer49 was incorporated into the GTNN architecture for pooling the atomic features into a molecular feature. The nodes H were transformed using the Attn function: Attn(Q,K,V) = QKTV, where query Q, key K and value V are learned features from the node representations H. Q is learned via individual embedding vectors per attention head. K and V are learned via individual GNNs GNNK and GNNV resulting in the overall graph attention head via equation (3):

$$}}}}_=}}}(}}}}}}}^,}}}}_^(}}},}}}),}}}}_^(}}},}}}))$$

(3)

where oi denotes the weighted pooling vector from one attention head, and WQ is a linear layer to learn the query vectors from H. Herein, four attention heads are incorporated, yielding the pooling scheme graph multi-head attention block GMH: GMH(Q,H,$}}}$) = [o1, o2, o3, o4]Wo. This learned molecular representation was transformed via an additional MLP, concatenated to a learned representation of the reaction conditions (Methods, “Condition featurization” for details) and transformed to the desired output via a MLP network.

aGNN. No pooling of atom features was applied, and H was directly transformed to the desired atomic output via a final MLP with a sigmoid activation function.

Training details

PyTorch Geometric (v.2.0.2)63 and PyTorch (v.1.10.1+cu102)64 functionalities were used for neural network training. Training was performed on a graphical processing unit, GPU (Nvidia GeForce GTX 1080 Ti) for four hours, using a batch size of 16 samples. The Adam stochastic gradient descent optimizer was employed65 with a learning rate of 10−4, a mean squared error (m.s.e.) loss on the training set, a decay factor of 0.5 applied after 100 epochs and an exponential smoothing factor of 0.9. Early stopping was applied to the model that achieved the lowest validation m.a.e. within 1,000 epochs. All the models considered in this study were trained on the Euler computing cluster at ETH Zurich, Switzerland.

Atom featurization

Atomic properties were encoded via the following atomic one-hot-encoding scheme: twelve atom types (H, C, N, O, F, P, S, Cl, Br, I, Si, Se), two ring types (true, false), two aromaticity types (true, false) and four hybridization types (sp3, sp2, sp, s). Additionally, for molecular graphs that contained electronic features, the atomic partial charges were calculated on the fly using DelFTa software66,67,68, obtaining DFT-level (ωB97X-D/def2-SVP (refs. 69,70)) Mulliken partial charges71. For molecular graphs that contained 3D information, the interatomic distances were represented in terms of Fourier features, using a sine-based and cosine-based encoding as previously shown in ref. 66.

Condition featurization

Molecular reaction conditions, that is, solvents, ligands, catalysts and reagents, were one-hot encoded. Whereas, the experimental dataset covered six ligands and four solvent types (that is, 24 possible conditions per substrate), the literature dataset covered twelve ligands, nine solvents, two reagents and four catalyst types (that is, 864 possible conditions per substrate). Supplementary Section 4 gives a detailed description of the structures covered by these one-hot-encodings.

Conformer generation

The 3D conformers were calculated using RDKit (AllChem.EmbedMolecule (ref. 72)) followed by energy minimization via the universal force field (UFF) method73. For each molecule, ten different conformers were calculated for training and testing. A conformer was randomly selected at each training step. For testing, the final predictions were obtained by averaging the individual predictions calculated for each of the ten conformers.

Baseline model

The ECFP4NN baseline model combined three MLPs for input transformation, namely the ECFP4 fingerprint and two embedded reaction conditions (that is, solvent and ligand). The ECFP4 feature dimension was set to 256 after screening the feature dimensions in the range of 27−210. Additional baseline experiments using binary reaction fingerprints with two popular decision tree algorithms, gradient boosting and extreme gradient boosting (XGBoost), can be found in Supplementary Section 10.

Number of hyperparameters

The feature dimension of the GNN internal representation was set to 128, except for (1) the embedding dimension of the reaction and atomic properties,tr which was set to 64, and (2) the first MLP layer after the graph multiset transformer-based pooling, which was set to 256. This setting resulted in neural network sizes of ~2.0 million trainable parameters for the GNN and aGNN models and ~3.0 million trainable parameters for GTNN. The dimensions within ECFP4NN were maintained at 128 yielding a neural network size of ~2.0 million trainable parameters.

Dataset filtering and reaction yield

From the total number of 1,301 reactions in the literature dataset, 492 reactions were used for yield prediction. Two filtering criteria were applied to obtain these training data: (1) duplicate reactions were removed, that is, reactions with identical annotations for starting material, catalyst, solvent, reagent, and product, and (2) only those reactions were included that included catalysts, solvents, reagents, and that occurred at least four times in the whole dataset (in line with the one-hot encoding described in Methods, “Condition featurization”).

Dataset filtering and regioselectivity

From the total number of 1,301 reactions in the literature dataset, 656 reactions were used for regioselectivity prediction. Three filtering criteria were applied to obtain these training data: (1) duplicate products (reactions with identical products) were removed, (2) only reactions using B2Pin2 (that is, bis(pinacolato)diboron) as the borylation product were kept and (3) an annotated yield of ≥30% was required.

View original article

NATURE CHEMISTRY

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning

Comments (0)