A semantic-based analysis of android malware for detection, generation, and trend analysis
Date of Issue2017-05-26
School of Computer Science and Engineering
Android has grown to be the most popular mobile operating system since its release in 2008. Due to its openness and ease of use, it attracts thousands of vendors and developers working on Android application development. Millions of apps provide a variety of functionalities to Android users, such as online shopping, instant messaging, gaming and map service. However, Android becomes a hot attack target of cybercriminals due to its prevalence. According to the security report of Symantec in 2016, the number of Android malware has reached 13 million in 2015. Android malware is uploaded into either Google official market or unofficial markets everyday by cybercriminals which put users under a high risk. The malware may steal users' sensitive information, elevate the privilege, remote control devices, and encrypt users' files for ransom. It is non-trivial to understand the risks and develop effective mitigation against them. Malware is the critical and non-trivial issue in Android security. In order to prevent malware from attacking the users, we need a better understanding of Android malware and its behaviors, which can facilitate the extraction of representative features from malware, and thereby enhance malware detection. The malware and anti-malware tools are keeping evolving during the process of competition. Therefore, it is valuable to learn the characteristics of evolving malware, and weakness of existing anti-malware tools. Moreover, a sustaining malware analysis and security assessment is lacking for the Android world. In order to address these problems, we propose a semantic based malware analysis on these topics with the following achievements in this thesis: 1. We propose a precise semantic model of Android malware based on Deterministic Symbolic Automaton (DSA) for the purpose of malware comprehension, detection and classification. Based on DSA, we develop an automatic analysis framework, named SMART, which learns DSA by detecting and summarizing semantic clones from malware families, and then extracts semantic features from the learned DSA to classify malware according to the attack patterns. We conduct the experiments in both malware benchmark and 223,170 real-world apps. The results show that SMART builds meaningful semantic models and outperforms both state-of-the-art approaches and anti-virus tools in malware detection. SMART identifies 4583 new malware in real-world apps that are missed by most anti-virus tools. The classification step further identifies new malware variants and unknown families. 2. We first propose a meta model for Android malware to capture the common attack features and evasion features in the malware. Based on this model, we develop a framework, Mystique, to automatically generate malware covering four attack features and two evasion features, by adopting the software product line engineering approach. With the help of Mystique, we conduct experiments to 1) understand Android malware and the associated attack features as well as evasion techniques; 2) evaluate and compare the 57 off-the-shelf anti-malware tools, 9 academic solutions and 4 Android market vetting processes in terms of accuracy in detecting attack features and capability in addressing evasion. Last but not least, we provide a benchmark of Android malware with proper labeling of contained attack and evasion features. Moreover, we extend this work to Mystique-S to explore the capabilities of anti-malware tools detecting malware with dynamic code loading. Mystique-S automatically selects attack features under various user scenarios and delivers the corresponding malicious payloads at runtime. Relying on dynamic code binding (via service) and loading (via reflection) techniques, Mystique-S enables the dynamic execution of payloads on user devices at runtime. Experimental results on real-world devices show that existing Anti-Malware Tools (AMTs) are incapable of detecting most of our generated malware. Last, we propose some enhancements for existing anti-malware tools. 3. We propose a systematic approach to study Android malware, unveil security issues, obtain insightful conclusions and highlights, and predict the future trend for research. We have collected 4,267,178 Android apps from a variety of Android marketplaces, where 1,004,550 malware variants are identified and analyzed. Different from previous works, this work focuses on the differences and evolution of apps' characteristics, and identifies multiple security-related issues concerned by both academia and industry. In order to provide a comprehensive view for these issues, we propose four analyses on individual app, malware family, malware author, and market, to conduct our study and guide the analysis. Furthermore, we propose six dimensions to cluster apps for different analysis tasks to achieve efficiency and accuracy in the large-scale analysis. Some of the key findings reflect the characteristics of attacks, and the weaknesses in protection, which can benefit all stakeholders.