Molecular generation using gated graph convolutional neural networks and reinforcement learning
Date of Issue2019
School of Computer Science and Engineering
The design of molecules with bespoke chemical properties has wide-ranging applications in materials science, chemistry and drug-discovery. This can be formulated as a supervised learning problem, where we first seek to encode discrete molecular graphs to continuous latent representations, and then use gradient-based optimization methods on these representations to optimize for the desired chemical properties. Recently, techniques such as Graph Convolutional Neural Networks (G-CNNs) and Message Passing Neural Networks (MPNNs) have been developed, which use deep learning methods to encode discrete graphs to continuous latent representations. In this project, we seek to attack the problem of molecular optimization with a twin- pronged approach of improving both the encoding technique and the optimization method. For this purpose, we build upon an existing state-of-the-art architecture called Junction Tree Variational Autoencoder (JT-VAE), which learns continuous latent vector representations for molecular graphs. These latent vector representations are then indirectly used for gradient-based optimization methods to improve chemical properties. JT-VAE makes use of MPNNs in its architecture. For the first part of our approach, we make use of a powerful variant of G-CNNs called, Gated-Graph Convolutional Neural Networks (GG-CNNs). We objectively demonstrate the efficacy of GG-CNNs over existing MPNN architectures in producing smooth and meaningful representations for molecular graphs. This is accomplished by replacing MPNN with GG-CNN in the JT-VAE architecture and then benchmarking their performance on various tasks. For the second part of our approach, we incorporate the reinforcement learning approach of Deep Deterministic Policy Gradient (DDPG) in the combined JT-VAE and GG-CNN architecture. Thus, we present a novel architecture incorporating GG-CNNs and DDPG on top of the JT-VAE architecture for purposes of molecular optimization. We perform all our experiments on the benchmark QM9 dataset, which contains 133,885 organic compounds having up to 9 heavy atoms.
DRNTU::Engineering::Computer science and engineering
Final Year Project (FYP)
Nanyang Technological University