Online Android malware family detection approaches based on progressive learning
Date of Issue2018-09-10
School of Electrical and Electronic Engineering
In the past decade, smartphone is becoming an important part of people's life. Android is the most used operating system on smartphone, and users' needs such as online shopping, social connection, entertainment and mobile payment can be fulfilled by using applications on mobile operation system. In recent years, Android malware is developing rapidly and Android malware detection has be studied by researchers. Batch learning model is the basic model for most machine learning based Android malware detection researches. However, the assumption of batch learning that malicious applications do not evolve over time is not the case in real world. The model is trained on existing malware dataset, thus the performance of the model degrades as predicting forthcoming data samples. In this dissertation, first we verify the reproducibility of the existing batch learning based malware detection method, DREBIN  and CSBD , conducting multiclass classification tasks using Support Vector Machine and Random Forest on feature data extracted by DREBINN and CSBD respectively. The result shows that high accuracy and acceptable efficiency can be reproduced on different dataset. Then to enable the models to learn new classes of mal ware, we conduct the experiment of retraining annually and semi-annually. Comparing with the experiments without retraining, the retraining experiments results imply that retraining indeed enable the models to learn new families of malware streaming in. However retraining process is an incrementally costly process since the size of the training dataset increase over time. Lastly Progressive Learning  is applied for adjusting the models to learn new families of malware when one sample emerges. The accuracy improves significantly compared with the retraining experiments. However retraining and progressive learning processes can be time consuming since the models are adjusted as one data sample emerges.
DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems