Application of Deep Learning Neural Networks in

3rd International School-Seminar “From Empirical to Predictive Chemistry” (E2PC2018)

Application of Deep Learning Neural Networks in Chemoinformatics: Advantages and Prospects

Igor I. Baskin M. V. Lomonosov Moscow State University

McСulloch-Pitts Model of Neuron O2 O2

x3

x2

x1 w1i

w2i

w3i

Activation function

x4

x5 w4i

zi = Σwjixj-θi

w5i

1, x   f ( x)   0, x  

yi = f(zi)

oi

f ( x)  1 (1  e x )

Four Major Classes of Neural Networks

Multilayer feed-forward

Recurrent

Recurrent with symmetrical weights (energybased)

Self-organized (with lateral weights)

Multilayer Feed-Forward Neural Network Input layer (descriptors)

Hidden layer (non-linear latent variable)

Output layer (properties being predicted)

Learning with Backpropagation: Phase 1

Learning with Backpropagation: Phase 2

𝜕𝐸 𝜕𝐸 = 𝑦𝑖 𝜕𝑤𝑖𝑗 𝜕𝑧𝑗

Vanishing Gradient Problem

𝑑𝑦𝑖 𝛿𝑖 (𝑙) = 𝑑𝑧𝑖

𝑤𝑖𝑗 𝛿𝑗 (𝑙 + 1)

Deep Learning Deep learning is the application to learning tasks of artificial neural networks with multiple hidden layers that form multiple levels of representations corresponding to different levels of abstraction • • • •

New activation functions (ReLU, etc) New regularization techniques (dropout, etc) New learning techniques (SGD, Adam) Unsupervised representation learning and autoencoders • Convolutional neural networks (CNN) • Recurrent neural networks (RNN) • Generative adversarial networks (GAN)

Jeffrey Hinton

Yoshua Begnio

Yann LeCun

Unsupervised Pre-training for Deep Learning PCA

DL

PCA

DL

1. Unsupervised ore-training

2. Building autoencoder by

using RBM networks

stacking pre-trained RBMs

3. Fine-tuning with

backpropagation

G.E.Hinton, R.R.Salakhutdinov, R. R. Science 2006, 313 (5786), 504-507

10

Performance of Deep Autoencoders PCA

DL images

PCA

DL texts

G.E.Hinton, R.R.Salakhutdinov, R. R. Science 2006, 313 (5786), 504-507

11

Activation Functions

Dropout - New Regularization Technique

New Optimization Algorithms Stochastic Gradient Descent (SGD)

w  w  f (w ) • Adaptive Gradient Algorithm (AdaGrad) maintains a per-parameter learning rate • Root Mean Square Propagation (RMSProp) maintains a per-parameter learning rate that are adapted based on the average of recent magnitudes of gradients • Adaptive Moment Estimation (Adam) computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients

Evolution of Image Recognition Performance

Deep learning

http://houseofbots.com/news-detail/2575-4-deep-learning-vs-classical-machine-learning

Successful Applications of Deep Learning • • • • • • •

Image recognition Speach recognition Natural language processing Playing games (Go, etc) Self-driving cars Medical diagnostics etc

Deep learning beats humans

Revolution in the field of artificial intelligence

Big hype in media

The Mystery of Deep Learning No one knows why deep learning neural networks are so efficient and adding new layers improves models • Surprisingly high efficiency of deep learning sharply contradicts all established theories • According to Kolmogorov-Arnold theorem: one hidden layer is sufficient • According to the PAC learning theory of L. Valiant and the statistical learning theory of V. Vapnik: no generalization • Deep neural network does not converge to any definite minimum – the number of local minima and saddle points can exceed the number of particles in the Universe! • Only the Information theory based on statistical physics can be used to analyze such Universe • The most promising approach seems to be the Information Bottleneck Theory by N. Tishby et al (arXiv:1703:00810v3, Apr. 2017)

Because of the complexity of the problem, a purely engineering trial and error approach dominates in the field of deep learning

Deep Learning vs Classical Machine Learning Deep learning is preferable: • Very high predictive performance in domains with huge amounts of labelled data (vision, speech, texts, etc) • Scales effectively with data • No need for feature engineering (can handle unstructured data) • Adaptable and transferable

Classical machine learning is preferable: • Works better on small and/or noisy data • Computationally and financially cheaper • Easier to interpret

http://houseofbots.com/news-detail/2575-4-deep-learning-vs-classical-machine-learning

Deep learning neural networks construct high-level features useful for toxicity prediction

Feature Construction by Deep Learning

First layers

Higher-level layer

Compounds leading to highest activation of neurons in different layers. Red – known toxicophores

Random split validation

Time split validation

Autoencoders

Autoencoder encodes an initial object to its higher-level compressed form (code) using encoder and then tries to recover the initial object using decoder.

The first use of autoencoders in chemoinformatics: virtual screening based on one-class classification

P.V.Karpov et al. Bioorganic and Medicinal Chemistry Letters 21 (2011) 6728-6731

Variational Autoencoder (VAE)

ACS Cent. Sci., 2018

Convolutional Neural Networks, CNN Convolution Layer

• A neuron in the hidden layer, called feature net, is connected to a patch of pixels • This neurons computed result of convolution between a weight matrix, called 29 the kernel, and the patch, called the receptive field

Convolutional Neural Networks, CNN Pooling Layer

• Reduces the size of feature map Max-pooling • Extracts the most important features • Makes the network invariant to small transformations 30

2D Convolutional Neural Networks: Image Recognition

31

1D Convolutional Neural Networks

3D Convolutional Neural Networks

32

33

34

arXiv:1703.10603

Neural Device for Searching Direct Correlations between Structures and Properties of Chemical Compounds Neural device in application to the propane molecule :

BRAIN EYE 1

EYE 2

("looks" at atoms)

(1)

(2)

("looks" at bonds) (1,2)

(3)

1

CH3 CH3

3 1

2

(2,3)

(3,2)

SENSOR FIELD

CH2

2

(2,1)

(each sensor detects the number of the attached hydrogen atoms)

3

36

Baskin I.I. et al, J. Chem. Inf. Comput. Sci, 1997, 37, 715-

Convolutional Neural Network on Graphs for Learning Molecular Fingerprints • Produces fingerprint of fixed size for molecular graph • Computational graph is derived from the computational graph of circular fingerprints • All weights are learned from data

D.K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, R.P. Adams. Advances in Neural Information Processing Systems, 2015, 2215-2223.

Recurrent Neural Networks (RNN) Unfolding RNN in Time

39

Long Short Term Memory (LSTM) Network LSTM are a special kind of RNN, capable of learning long-term dependencies

The key to LSTMs is the cell state, the horizontal line running through the top of the 40 diagram. Writing to it is regulated by memory and forget gates.

What can RNNs do? • • • • • • • • • • •

Language modeling and generating text Machine translation Speech recognition and generation Generating image descriptions Time series processing Movies and video clips processing Music classification and generation Choreography …. Bioinformatics (DNA, RNA, proteins, peptides) Chemoinformatics (SMILES strings) 41

Processing SMILES Strings with LSTM

Model of the RNN–LSTM producing SMILES strings, token by token. During training, the model predicts the next token for each input token in the sequence. 42

1. LST-RNN network Is trained to generate valid SMILES strings with high accuracy using 541,555 valid SMIILES strings 2. The model is fine-tuned to specific ligand subsets with certain biological activity 3. The model is applied to fragment-based drug discovery by growing a library of leads starting from a known active fragment 43

Zheng Xu, Sheng Wang, Feiyun Zhu, Junzhou Huang

Encoder

Fingerprint extraction

Decoder

44

Zheng Xu, Sheng Wang, Feiyun Zhu, Junzhou Huang

45

Reinforcement Learning Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize reward (or minimize penalty)

• Control • Games

Picture from: https://www.quora.com/What-is-reinforcement-learning

arXiv:1711.10907

Generative Adversarial Networks (GANs)

Generative adversarial networks (GANs) are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework Wikipedia

Generative Adversarial Networks (GANs) for Image Generation

• Generator tries to fool discriminator with fake images • Discriminator tries to improve its ability to detect fake images

The generator G is trained via policy gradient to maximize two rewards at the same time: (1) one that improves the objective (activity) and (2) another that tries to mimic real structures by fooling the discriminator D in a GAN setting D – convolutional neural network (CNN) G – long short term memory recurrent neural network (LSTM-RNN) arXiv:1705.10843v1

The Most Promising Areas of the Use of Deep Learning in Chemoinformatics and Related Domains • Drug and material design by generating new molecules with required properties • Structure optimization • Toxicity prediction • Computer-aided synthesis design • Protein-ligand binding prediction • Mining of chemical texts and images • Analytical chemistry and spectroscopy • Force-field approximation, molecular modeling, docking • Acceleration of quantum-chemistry calculations • Multi-scale modeling material properties • etc

Review Articles • I.I. Baskin, D. Winkler, I.V. Tetko. “A Renaissance of Neural Networks in Drug Discovery”, Expert Opinion on Drug Discovery, 2016, vol. 11, no. 8, pp. 785-795 • E. Gawehn, J.A. Hiss, G. Schneider. “Deep Learning in Drug Discovery”, Molecular Informatics, 2016, vol. 35, no. 1, pp. 3-14 • G.B. Goh, N.O. Hodas, A. Vishnu. “Deep Learning for Computational Chemistry”, Journal of Computational Chemistry, 2017, vol. 38, no. 16, pp. 1291-1307. • H. Chen, O. Engkvist, Y. Wang, M. Olivecrona, T. Blaschke. “The Rise of Deep Learning in Drug Discovery”, Drug Discovery Today, 2018, in press. • Jing, Y., Bian, Y., Hu, Z. et al. “Deep Learning for Drug Design: an Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era”, The AAPS Journal (2018) 20: 58

Application of Deep Learning Neural Networks in

Application of Deep Learning Neural Networks in

Suggest Documents

Learning Structured Sparsity in Deep Neural Networks

Deep Learning in Spiking Neural Networks

Learning Adversary-Resistant Deep Neural Networks - arXiv

Deep Learning with Dense Random Neural Networks

Deep Learning and Neural Networks - Google Sites

Deep neural networks for learning classification ...

Learning from LDA using Deep Neural Networks

Deep learning in neural networks: An overview - Department of ...

Deep Neural Networks

Deep Neural Networks

Why Deep Neural Networks?

Random feedback weights support learning in deep neural networks

Deep learning in neural networks: An overview - Iowa State University ...

Degrees of Freedom in Deep Neural Networks

Application of Deep Convolutional Neural Networks for Detecting ...

Application of Pretrained Deep Neural Networks ... - Vincent Vanhoucke

Application of Pretrained Deep Neural Networks to Large - Google Sites

Application of Deep Convolutional Neural Networks for Detecting ...

Mosquito Detection with Neural Networks: The Buzz of Deep Learning

Mosquito Detection with Neural Networks: The Buzz of Deep Learning

A history of Bayesian neural networks - Bayesian Deep Learning ...

Compressed Learning of Deep Neural Networks for ... - MDPIwww.researchgate.net › 5cc0a32f4585156cd7afa122 › C

APPLICATION OF NEURAL NETWORKS TO

An Application of Neural Networks