Labs

42 downloads 0 Views 212KB Size Report
The remaining sections of this application note will highlight the benefits of building ... CH3. CH. 3. 5. Figure 1: The five compounds used for this study. (1) (1Z)-1 ...
Application Note Improving NMR Shift Predictions with Database Training ACD/HNMR and ACD/CNMR [1,2] Ryan Sasaki and Brent Lefebvre Advanced Chemistry Development, Inc. Toronto, ON, Canada www.acdlabs.com

Introduction ACD/Labs has succeeded in improving the accuracy of its NMR predictions each and every year by refining the algorithms and upgrading the prediction databases with the most recently published experimental NMR data. However, further improvements are possible for certain types of compounds because the synthetic boundaries are being pushed by chemists in chemical and pharmaceutical research areas every day. With the synthesis of novel compounds and the introduction of new families of compounds, there are times where the molecule being predicted has structural characteristics that are not well represented in databases. To acknowledge the fact that there will always be proprietary structures not well characterized in literature, ACD/Labs has created the opportunity for users to train the predictors by adding their own chemical shifts. This application note will show some examples of how predictions can be improved by creating a user database.

Cutting Down Assignment Time The use of NMR prediction software can help a user assign experimental NMR shifts in order to confirm the presence of a particular structural fragment. This process of checking the consistency of a structure to an experimental spectrum is often referred to as verification. The verification process can be carried out manually, using the professional knowledge of a chemist or an NMR expert to create and evaluate the structure-to-spectrum correspondence. NMR prediction software can help users verify structural assignments made to an NMR experimental spectrum. Viewing the predicted and experimental spectrum side-by-side and objectively evaluating the quality of the match while making the assignments can result in substantial time savings as well as increased confidence and accuracy (see the section: Improving 1H Predictions Through Training). If an incorrect structure was assigned to a spectrum, adverse consequences ranging from additional investment of an expert’s time to lost opportunities due to incorrect synthesis could result. The accuracy of the verification process must be a top priority. In order to accelerate the interpretation process without sacrificing the quality of assignment, an NMR predictor must produce accurate predictions for the compounds under study. Furthermore, the software should be able to build knowledge when new and novel compounds are synthesized in the research lab. With this functionality in place, one scientist can impact the prediction accuracy for an entire organization by building a user training database that can continually improve the NMR predictions for all chemists.

Application Note The remaining sections of this application note will highlight the benefits of building a knowledge base of NMR legacy data to improve NMR predictions throughout the organization.

Improving 1H Predictions Through Training Figure 1 shows 5 structures that were used for the prediction training study in ACD/HNMR. Here, we predicted the 1H NMR chemical shifts for all the protons in compounds 2, 3, 4, and 5 without any training and compared them to the published experimental chemical shifts of these compounds. We then put compound 1 into the user database in order to “train” the predictions [3]. H3C

O

H3C

N O H3C

H3C

O

OO

N

O

O O

O

OO

N

O

CH3

H3C

O

O

-

O

H3C O

+

1

O H3C

P

2

3 H3C

O N

CH3 O

O O

N

H3C

O O

-

OO H C 3

N

O O

H3C

O +

+

P

O N

O CH3 O

N

O

CH3

O +

P

H3C

O N

N

CH3 O

CH3 CH3

+

P

P

5

4

Figure 1: The five compounds used for this study. (1) (1Z)-1,4-diethoxy-3-(3-methyl-2,4,5-trioxoimidazolidin-1-yl)-4-oxo-2(triphenylphosphonio)but-1-en-1-olate, (2) (1Z)-1,4-dimethoxy-3-(3-methyl-2,4,5-trioxoimidazolidin-1-yl)-4-oxo-2(triphenylphosphonio)but-1-en-1-olate, (3) (1E)-1,4-dimethoxy-3-(3-methyl-2,4,5-trioxoimidazolidin-1-yl)-4-oxo-2(triphenylphosphonio)but-1-en-1-olate, (4) (1E)-1,4-diethoxy-3-(3-methyl-2,4,5-trioxoimidazolidin-1-yl)-4-oxo-2(triphenylphosphonio)but-1-en-1-olate, (5) (1Z)-1,4-di-tert-butoxy-3-(3-methyl-2,4,5-trioxoimidazolidin-1-yl)-4-oxo-2(triphenylphosphonio)but-1-en-1-olate.

Figure 2a is a screenshot of the predicted and experimental shifts of compound 2 side by side for easy evaluation of the untrained dataset. This screenshot illustrates the case where the methine proton in the structure was predicted poorly. Further evaluation of the calculation protocol shows that this specific family of compounds is not represented well in the prediction database, resulting in a poor prediction for this chemical shift. Figure 2b is a screenshot of the predicted and experimental shifts of compound 2 side by side after entering the experimental chemical shifts of compound 1 into the user database. This figure clearly shows the benefit of adding just one compound from this family into the user database.

2

Application Note

Figure 2a: Comparison of the experimental (top) and predicted (bottom) shifts in an untrained dataset. Note the disagreement between the experimental and predicted chemical shift for the methine proton (highlighted with the asterisk).

Figure 2b: Comparison of the experimental (top) and predicted (bottom) shifts in a trained dataset. Note the agreement between the experimental and predicted chemical shift for the methine proton (highlighted with the asterisk).

3

Application Note To quantify the accuracy improvement of the predictions through training, the following formula to calculate the standard error between the experimental and predicted values was used: Standard Error (ppm) =

∑ (δ

exp

− δ calc ) 2

n2

Where δ is the Chemical Shift in ppm for the calculated and experimental shifts, and n is the number of NMR chemical shifts in the dataset. The structural similarities of compound 1 with compounds 2, 3, 4, and 5, would lead us to expect an improvement in the standard error because the predictions will now use the published chemical shifts of compound 1—a very similar structure with likely similar chemical shifts. The results of the study are shown in Table 1.

Compound 2 3 4 5

Standard Error (ppm) Untrained Shifts Trained Shifts 0.60 0.09 1.18 0.18 0.61 0.02 0.64 0.08

Table 1: Results of the standard error for each structure of trained and untrained NMR Predictions calculated using ACD/HNMR.

Table 1 illustrates a remarkable improvement in prediction by creating a database with only one related structure (compound 1). Without training, the prediction of compound 2 reveals a standard error of prediction of 0.60 ppm. However, by creating a user database with one related compound, (a manual process that takes approximately 1 minute), the prediction of compound 2 is improved by 85%. Training with many related structures is possible, but not necessary to realize substantial increases in accuracy. As can be seen in this example, in many cases, a single structure is enough.

Improving

13

C Predictions Through Training

Figure 3 below shows 4 imidazole compounds that were used for this study. Following the same methodology the 13C chemical shifts of compounds 1, 2, and 3 were predicted with and without training of compound 4. The prediction results were then compared to the published experimental values using the same standard error calculation stated above [4]. The results are shown in Table 2 below. .

4

Application Note N

N

OH

OH CH3

N H

N H

S CH3

O

N

N H3C

OH CH3

N H

O

2

1

H3C

S

S O

OH CH3

N H

S O

4

3

Figure 3: The four compounds used for this study. (1) 4-benzyl-5-ethyl-1H-imidazole-2-sulfinic acid, (2) 4(cyclohexylmethyl)-5-ethyl-1H-imidazole-2-sulfinic acid, (3) 4-benzyl-5-isopropyl-1H-imidazole-2-sulfinic acid (4) 4(cyclohexylmethyl)-5-isopropyl-1H-imidazole-2-sulfinic acid.

Compound 1 2 3

Standard Error (ppm) Untrained Shifts Trained Shifts 4.25 0.86 3.44 0.82 4.84 0.48

Table 2: Standard error between predicted and experimental data of the trained and untrained datasets in ACD/CNMR.

Table 2 also illustrates a remarkable improvement in prediction by creating a database with only one of the related structures (compound 4). Without training, the prediction of compound 2 reveals a reasonably good standard error of prediction of 3.44 ppm. However, by creating a user database with one related compound, the prediction accuracy of compound 2 is improved by 76%.

Conclusion HNMR and CNMR prediction software is remarkably accurate for small molecules in well-known chemical classes. Our most recent accuracy study reveals considerable increases in accuracy in the latest version of the software, version 8 [5]. However due to the number of new classes of compounds that are synthesized each year in R&D facilities worldwide, and the fact that many of these classes go unpublished, it is impossible to keep up with the ever-evolving chemical diversity of these structural families. As a result, ACD/HNMR and ACD/CNMR contain training capabilities to ensure accurate prediction for structural families not well represented in the internal prediction database. In addition ACD/NNMR, FNMR, and PNMR all offer the same training capability, and this practice can be followed in the same manner to improve X-nuclei predictions. The above study illustrates how effective training predictors can be. For information on how training can be implemented in an NMR verification workflow, refer to the Application Note: Sharing the Workload of NMR Interpretation Efficiently [6].

5

Application Note References 1. ACD/HNMR Predictor. http://www.acdlabs.com/hnmr/. 25 June, 2004. 2. ACD/CNMR Predictor. http://www.acdlabs.com/cnmr/. 25 June, 2004. 3. Yavari, Issa; Zabarjad-Shiraz, Nader. Monatshefte für Chemie. 2003, 134, 445. 4. Loksha, Yasser M.; El-Barbary, Ahmed A.; El-Badawi, Mahmoud A.; Nielsen, Claus; Pedersen, Erik B. Synthesis. 2004, 01, 116-120. 5. NMR Predictor Comparison. http://www.acdlabs.com/products/spec_lab/predict_nmr/chemnmr/. 9 November 2004. 6. ACD/Labs Application Note, “Sharing the Workload of NMR Interpretation Efficiently”, http://www.acdlabs.com/download/app/nmr/casv.pdf

6