Challenges and solutions in drug-target interaction prediction
Date of Issue2018
School of Computer Science and Engineering
Bioinformatics Research Centre
When a drug is developed, it is designed so that it interacts with a specific target of interest in order to achieve the desired therapeutic effect. However, it is quite common to later find that the developed drug also interacts with multiple other targets that were not intended during its development. This is interesting because if a drug can interact with multiple targets, then it may have more than one therapeutic effect. Therefore, this provides a clear motivation for discovering new interactions for existing drugs. In drug discovery, an important task called drug-target interaction prediction detects such interactions on a large scale by screening many drugs and targets simultaneously. While there are wet-lab techniques for discovering these interactions, the focus of this thesis is particularly on computational drug-target interaction prediction. Specifically, we investigate methods that discover new interactions based on prior knowledge of existing drugs and their experimentally confirmed targets (i.e. machine learning). Throughout this thesis, we identified and addressed 4 outstanding problems in drug target interaction (DTI) prediction. Having addressed these problems, we were able to enhance the prediction performance and outperform relevant state-of-the-art methods. Firstly, DTI prediction methods have difficulty predicting interactions involving new drugs or targets for which there are no known interactions. To predict interactions, we developed two matrix factorization methods that utilize graph regularization. In addition, considering that many of the non-occurring edges in the bipartite DTI network are actually unknown or missing cases, we developed a preprocessing step to enhance predictions in the “new drug” and “new target” cases by adding edges with intermediate interaction likelihood scores. In our experiments, our methods performed better than the state-of-the-art methods and was found to predict interactions reasonably well. Secondly, class imbalance is an issue that is prevalent across all DTI datasets. Class imbalance can be divided into two sub-problems, namely between-class and within-class 7 imbalance. Between-class imbalance refers to the imbalance ratio between interacting and non-interacting drug-target pairs; this degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Withinclass imbalance refers to the imbalance between the sizes of sub-groups (types) of interactions; this biases the predictions towards the bigger and more well-represented sub-groups, leading to more errors in the smaller groups. Here, we developed an ensemble learning method that incorporates techniques to address the issues of between class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. Thirdly, there are DTI datasets where the feature sets for representing the drugs and targets (and, by extension, the drug-target pairs) are of a high dimensionality. High dimensionality of the data may lead to much longer running times for the prediction models. Furthermore, there may be redundancy in the features which may also lead to degradation in prediction performance. In this work, we used dimensionality reduction to deal with both of these issues, and we additionally used ensemble learning to improve the prediction performance further. As base learners for the ensemble, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. Experimental results show that our proposed methods are indeed successful. Lastly, there is a concept called differential representation bias that has an impact on the prediction performance of DTI prediction methods. Specifically, differential representation bias refers to how much a drug (or target) appears in the positive training data as opposed to the negative data. Bearing this concept in mind, we experimented with the way that the negative training data is sampled prior to training the prediction model. We found that our modified sampling procedure produced significant improvements in DTI prediction performance.
DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences