Generative adversarial networks (GAN). Deep learning is the .... A neuron in the hidden layer, called feature net, is connected to a patch of pixels. ⢠This neurons ...
3rd International School-Seminar “From Empirical to Predictive Chemistry” (E2PC2018)
Application of Deep Learning Neural Networks in Chemoinformatics: Advantages and Prospects
Igor I. Baskin M. V. Lomonosov Moscow State University
McСulloch-Pitts Model of Neuron O2 O2
x3
x2
x1 w1i
w2i
w3i
Activation function
x4
x5 w4i
zi = Σwjixj-θi
w5i
1, x f ( x) 0, x
yi = f(zi)
oi
f ( x) 1 (1 e x )
Four Major Classes of Neural Networks
Multilayer feed-forward
Recurrent
Recurrent with symmetrical weights (energybased)
Self-organized (with lateral weights)
Multilayer Feed-Forward Neural Network Input layer (descriptors)
Hidden layer (non-linear latent variable)
Output layer (properties being predicted)
Learning with Backpropagation: Phase 1
Learning with Backpropagation: Phase 2
𝜕𝐸 𝜕𝐸 = 𝑦𝑖 𝜕𝑤𝑖𝑗 𝜕𝑧𝑗
Vanishing Gradient Problem
𝑑𝑦𝑖 𝛿𝑖 (𝑙) = 𝑑𝑧𝑖
𝑤𝑖𝑗 𝛿𝑗 (𝑙 + 1)
Deep Learning Deep learning is the application to learning tasks of artificial neural networks with multiple hidden layers that form multiple levels of representations corresponding to different levels of abstraction • • • •
New activation functions (ReLU, etc) New regularization techniques (dropout, etc) New learning techniques (SGD, Adam) Unsupervised representation learning and autoencoders • Convolutional neural networks (CNN) • Recurrent neural networks (RNN) • Generative adversarial networks (GAN)
Jeffrey Hinton
Yoshua Begnio
Yann LeCun
Unsupervised Pre-training for Deep Learning PCA
DL
PCA
DL
1. Unsupervised ore-training
2. Building autoencoder by
using RBM networks
stacking pre-trained RBMs
3. Fine-tuning with
backpropagation
G.E.Hinton, R.R.Salakhutdinov, R. R. Science 2006, 313 (5786), 504-507
10
Performance of Deep Autoencoders PCA
DL images
PCA
DL texts
G.E.Hinton, R.R.Salakhutdinov, R. R. Science 2006, 313 (5786), 504-507
11
Activation Functions
Dropout - New Regularization Technique
New Optimization Algorithms Stochastic Gradient Descent (SGD)
w w f (w ) • Adaptive Gradient Algorithm (AdaGrad) maintains a per-parameter learning rate • Root Mean Square Propagation (RMSProp) maintains a per-parameter learning rate that are adapted based on the average of recent magnitudes of gradients • Adaptive Moment Estimation (Adam) computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients
Evolution of Image Recognition Performance
Deep learning
http://houseofbots.com/news-detail/2575-4-deep-learning-vs-classical-machine-learning
Successful Applications of Deep Learning • • • • • • •
Image recognition Speach recognition Natural language processing Playing games (Go, etc) Self-driving cars Medical diagnostics etc
Deep learning beats humans
Revolution in the field of artificial intelligence
Big hype in media
The Mystery of Deep Learning No one knows why deep learning neural networks are so efficient and adding new layers improves models • Surprisingly high efficiency of deep learning sharply contradicts all established theories • According to Kolmogorov-Arnold theorem: one hidden layer is sufficient • According to the PAC learning theory of L. Valiant and the statistical learning theory of V. Vapnik: no generalization • Deep neural network does not converge to any definite minimum – the number of local minima and saddle points can exceed the number of particles in the Universe! • Only the Information theory based on statistical physics can be used to analyze such Universe • The most promising approach seems to be the Information Bottleneck Theory by N. Tishby et al (arXiv:1703:00810v3, Apr. 2017)
Because of the complexity of the problem, a purely engineering trial and error approach dominates in the field of deep learning
Deep Learning vs Classical Machine Learning Deep learning is preferable: • Very high predictive performance in domains with huge amounts of labelled data (vision, speech, texts, etc) • Scales effectively with data • No need for feature engineering (can handle unstructured data) • Adaptable and transferable
Classical machine learning is preferable: • Works better on small and/or noisy data • Computationally and financially cheaper • Easier to interpret
http://houseofbots.com/news-detail/2575-4-deep-learning-vs-classical-machine-learning
Deep learning neural networks construct high-level features useful for toxicity prediction
Feature Construction by Deep Learning
First layers
Higher-level layer
Compounds leading to highest activation of neurons in different layers. Red – known toxicophores
Random split validation
Time split validation
Autoencoders
Autoencoder encodes an initial object to its higher-level compressed form (code) using encoder and then tries to recover the initial object using decoder.
The first use of autoencoders in chemoinformatics: virtual screening based on one-class classification
P.V.Karpov et al. Bioorganic and Medicinal Chemistry Letters 21 (2011) 6728-6731
Variational Autoencoder (VAE)
ACS Cent. Sci., 2018
Convolutional Neural Networks, CNN Convolution Layer
• A neuron in the hidden layer, called feature net, is connected to a patch of pixels • This neurons computed result of convolution between a weight matrix, called 29 the kernel, and the patch, called the receptive field
Convolutional Neural Networks, CNN Pooling Layer
• Reduces the size of feature map Max-pooling • Extracts the most important features • Makes the network invariant to small transformations 30
2D Convolutional Neural Networks: Image Recognition
31
1D Convolutional Neural Networks
3D Convolutional Neural Networks
32
33
34
arXiv:1703.10603
Neural Device for Searching Direct Correlations between Structures and Properties of Chemical Compounds Neural device in application to the propane molecule :
BRAIN EYE 1
EYE 2
("looks" at atoms)
(1)
(2)
("looks" at bonds) (1,2)
(3)
1
CH3 CH3
3 1
2
(2,3)
(3,2)
SENSOR FIELD
CH2
2
(2,1)
(each sensor detects the number of the attached hydrogen atoms)
3
36
Baskin I.I. et al, J. Chem. Inf. Comput. Sci, 1997, 37, 715-
Convolutional Neural Network on Graphs for Learning Molecular Fingerprints • Produces fingerprint of fixed size for molecular graph • Computational graph is derived from the computational graph of circular fingerprints • All weights are learned from data
D.K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, R.P. Adams. Advances in Neural Information Processing Systems, 2015, 2215-2223.
Recurrent Neural Networks (RNN) Unfolding RNN in Time
39
Long Short Term Memory (LSTM) Network LSTM are a special kind of RNN, capable of learning long-term dependencies
The key to LSTMs is the cell state, the horizontal line running through the top of the 40 diagram. Writing to it is regulated by memory and forget gates.
What can RNNs do? • • • • • • • • • • •
Language modeling and generating text Machine translation Speech recognition and generation Generating image descriptions Time series processing Movies and video clips processing Music classification and generation Choreography …. Bioinformatics (DNA, RNA, proteins, peptides) Chemoinformatics (SMILES strings) 41
Processing SMILES Strings with LSTM
Model of the RNN–LSTM producing SMILES strings, token by token. During training, the model predicts the next token for each input token in the sequence. 42
1. LST-RNN network Is trained to generate valid SMILES strings with high accuracy using 541,555 valid SMIILES strings 2. The model is fine-tuned to specific ligand subsets with certain biological activity 3. The model is applied to fragment-based drug discovery by growing a library of leads starting from a known active fragment 43
Zheng Xu, Sheng Wang, Feiyun Zhu, Junzhou Huang
Encoder
Fingerprint extraction
Decoder
44
Zheng Xu, Sheng Wang, Feiyun Zhu, Junzhou Huang
45
Reinforcement Learning Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize reward (or minimize penalty)
• Control • Games
Picture from: https://www.quora.com/What-is-reinforcement-learning
arXiv:1711.10907
Generative Adversarial Networks (GANs)
Generative adversarial networks (GANs) are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework Wikipedia
Generative Adversarial Networks (GANs) for Image Generation
• Generator tries to fool discriminator with fake images • Discriminator tries to improve its ability to detect fake images
The generator G is trained via policy gradient to maximize two rewards at the same time: (1) one that improves the objective (activity) and (2) another that tries to mimic real structures by fooling the discriminator D in a GAN setting D – convolutional neural network (CNN) G – long short term memory recurrent neural network (LSTM-RNN) arXiv:1705.10843v1
The Most Promising Areas of the Use of Deep Learning in Chemoinformatics and Related Domains • Drug and material design by generating new molecules with required properties • Structure optimization • Toxicity prediction • Computer-aided synthesis design • Protein-ligand binding prediction • Mining of chemical texts and images • Analytical chemistry and spectroscopy • Force-field approximation, molecular modeling, docking • Acceleration of quantum-chemistry calculations • Multi-scale modeling material properties • etc
Review Articles • I.I. Baskin, D. Winkler, I.V. Tetko. “A Renaissance of Neural Networks in Drug Discovery”, Expert Opinion on Drug Discovery, 2016, vol. 11, no. 8, pp. 785-795 • E. Gawehn, J.A. Hiss, G. Schneider. “Deep Learning in Drug Discovery”, Molecular Informatics, 2016, vol. 35, no. 1, pp. 3-14 • G.B. Goh, N.O. Hodas, A. Vishnu. “Deep Learning for Computational Chemistry”, Journal of Computational Chemistry, 2017, vol. 38, no. 16, pp. 1291-1307. • H. Chen, O. Engkvist, Y. Wang, M. Olivecrona, T. Blaschke. “The Rise of Deep Learning in Drug Discovery”, Drug Discovery Today, 2018, in press. • Jing, Y., Bian, Y., Hu, Z. et al. “Deep Learning for Drug Design: an Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era”, The AAPS Journal (2018) 20: 58