Network intrusion detection via distributed machine learning on smart gateway network
Suleman Khalid Rai
Date of Issue2017-07-26
School of Electrical and Electronic Engineering
The future smart building requires the deployment of smart grid to not only provide physical power supply but to also make cyber control with energy savings. A smart home is a typical cyber-physical system with linkage between physical power grid and cyber information network. Usually the linkage here can be a data server or a gateway with collected personal data of energy, temperature, humidity, Wi-Fi etc. This however may result in a serious security concern if the data server or gateway are intruded with malicious attacks. Vulnerabilities inherent to communication and computing of a cyber-physical system such as smart home can affect the entire infrastructure. Various methods have been proposed to detect these intrusions and prevent them from causing harm to the smart home system. This thesis studies machine learning approach to identify malicious intrusions and explores the performance improvement in time complexity and classification accuracy through distributed machine learning algorithms. In this research, supervised machine learning algorithms which are studied are SVM and ELM. Two publicly available datasets NSL-KDD and ISCX 2012 are evaluated to compare the performance of SVM and ELM. Experiments are performed for both binary and multi class classification to evaluate the performance of these classifiers on the benchmark datasets. For SVM, four different kernel functions are used for binary and multi class classification. For NSL-KDD benchmark, the sigmoid kernel function gives the best classification accuracy and has reasonable time complexity when compared against other kernel functions. For ISCX 2012 dataset, Linear and Sigmoid Kernel functions give comparable performance metrics. Experiment results also show that classification performance of SVM is superior to ELM for most of the intrusion classes. ELM, on the other hand, offers a better performance in time complexity. To further improve the time complexity of ELM, an incremental least square matrix solver based on Cholesky decomposition is utilized. Because of the data imbalance, ELM does not give good performance metrics on the Remote to Local (R2L) and User to Root (U2R) classes. To resolve this data imbalance, weighted ELM is proposed to improve the classification performance for these classes. Experiment results show that using Weighted ELM, the sensitivity improves from 0 % to 77.6 % for the U2R class and from 0 % to 32.9 % for the Remote to Local (R2L) class in comparison to the basic ELM. The precision improves slightly from 0 % to 2 % for the User to Root (U2R) class and improves from 23.3 % to 46.8 % for the Remote to Local (R2L) class. Conventional machine learning approaches usually follow a centralized data analytic architecture with all the computation performed on a single central server. In the off line training phase depending upon the amount of data to be processed, this can be a huge workload specially on the resource limited smart gateways. To overcome this issue, a distributed machine learning algorithm Sequential Distributed SVM (SQ-DSVM) is proposed to resolve the off line training time complexity for intrusion detection. Experimental results show that with 5 smart gateways working in parallel, the proposed distributed algorithm Sequential Distributed SVM (SQ-DSVM) can accomplish a performance improvement in run time up to 5x to centralized SVM. Apart from Distributed Support Vector Machine (DSVM), computationally efficient Distributed Neuron Network (DNN) for intrusion detection is also considered. Experiment results show that for NSL-KDD dataset using a Distributed Neuron Network (DNN) with 5 subsystems, the server processing time improves by a factor of 12x when compared to SVM and up to 5x when compared to centralized network intrusion detection system (NIDS) using only a single neural network. For ISCX dataset server processing time improves by a factor of 20x when compared to SVM and by a factor of 5x when compared to centralized NIDS using only a single neural network. In real time, intrusion detection latency is important to detect intrusions. Queuing delay analysis is done using an M/M/1 queuing model. Experiment results show that using Distributed Neuron Network (DNN), the queuing delay and total delay are reduced proportionally to the number of sub systems used in a Distributed Neuron Network (DNN). To improve the classification performance for different classes, a hybrid classification scheme is used which combines SVM and ELM into a single classification system. Experiment results show SVM should be used for detecting Denial of Service (DOS) and User to Root (U2R) classes whereas ELM should be used for detecting Probe and Remote to Local (R2L) classes. In practice labeling the data for normal and intrusion instances is quite an expensive human operation. To overcome this issue an unsupervised machine learning algorithm, Unsupervised Extreme Learning Machine (US-ELM) is used to classify instances into 2 different clusters for normal and intrusive instances. Experiment results show that clustering accuracy increases with the embedding dimension and achieves a peak clustering accuracy of 86.9 % with the embedding dimension of 12. Using SVM for the proposed hybrid classification model after clustering, the f-measure for the Normal class achieves 94.9 %, Denial of Service (DOS) achieves 99.5 %, User to Root (U2R) achieves 55.4 % while Remote to Local (R2L) achieves 98.4 % accuracy which are better than classification metrics using ELM. Using ELM for the proposed hybrid classification model after clustering the Probe class achieves 89.7 % accuracy which is better than SVM. It is thus proposed to use SVM for detecting Denial of Service (DOS), User to Root (U2R), Remote to Local (R2L) and Normal classes while ELM should be used for detecting Probe class. In summary the main contribution of this thesis can be summarized as follows. Firstly, machine learning algorithms SVM and ELM are utilized for binary and multi class intrusion classification for two benchmark datasets. Secondly, sequential distributed SVM is utilized to reduce time complexity for SVM classifier. Thirdly, a Distributed Neuron Network (DNN) is proposed to reduce server processing time and queuing delay for the ELM classifier. Fourthly, a hybrid scheme consisting of SVM and ELM is utilized to improve the classification for all the intrusion classes. Lastly, an unsupervised machine learning algorithm based on ELM is utilized to classify unlabeled test data and the improvement in classification of various classes using the hybrid classification method is presented.
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence