Efficient and effective extreme learning machine with minimal user intervention
Date of Issue2016
School of Electrical and Electronic Engineering
Artificial Neural Networks (ANN) is a dominate machine learning technique inspired by biological neural networks. The data explosion today offers great opportunities for algorithms like ANN that can uncover sophisticated relationships in various applications. However, traditional ANN algorithms like Back-Propagation (BP) have been known to face challenging issues such as specifying learning rate, the number of epochs, stopping criterion and running into local minima. In recent years, an emerging algorithm named Extreme Learning Machine (ELM) has attracted a significant amount of research attentions. ELM is a variant of ANN with a single hidden layer. In contrast to normal tedious tuning procedures, the input weights and biases are randomly generated. Therefore, a lot of computing resources can be saved and high learning efficiency is achieved in ELM, which is also its most salient feature. However, some issues still exist in ELM. For example, although the non-analytical parameter determination procedure improves the learning efficiency, it also causes fluctuating performance of ELM on the same problem with different initial parameters. Thus, the ELM algorithm may appear less stable. To tackle this issue, regularization approaches have been applied in the ELM algorithms, and the most common one is the Regularized Extreme Learning Machine (RELM), which employs the ridge regression. Better generalization performance and higher stability have been reported with RELM, but slower learning efficiency is expected. The original ELM utilizes the batch learning strategy, which faces difficulties when dealing with sequential data. Therefore, many ELM variants that can deal with this issue were created, with the Online Sequential ELM (OS-ELM) being the most notable one. However, an inevitable procedure is the faced by nearly all of ELM and other ANN algorithms, that is parameter tuning. Although the ELM enjoys the benefit of easy implementation because of the randomly generated initial parameters, its performance is still greatly influenced by the size of the hidden layer, i.e., the number of hidden neurons $L$. The optimal choice of $L$ in different tasks typically ranges from several to hundreds, and the issues of underfitting and overfitting are commonly associated with it. Constructive approaches have been extensively proposed to automatically select the $L$. Notable ones include Incremental ELM (I-ELM), Convex incremental ELM (CI-ELM), Error-Minimized ELM (EM-ELM) and Dynamic ELM (D-ELM). However, one fatal flaw is shared among the aforementioned constructive ELMs, that is they all use the expected learning accuracy (training error) as the architecture selection criterion. The training error will keep decreasing until reaching zero with more and more hidden neurons added, therefore specifying an appropriate expected learning accuracy is just another way of choosing $L$, and the performance is highly affected by the choice. In this thesis, we adopt the Leave-One-Out Cross-Validation (LOO-CV) error as the performance metric, which has rarely been used for this purpose because of its extremely slow execution speed. Various techniques have been implemented to make the calculation of LOO-CV error highly efficient. Therefore, we are able to propose a pair of algorithms that can achieve desirable performance with no or minimal user intervention, which is also the central theme of this thesis. Specifically, the Efficient Leave-One-Out Cross-Validation Based Extreme Learning Machine (ELOO-ELM) is proposed to automatically select the optimal $L$ in ELM. To deal with the regularization in RELM, the Efficient Leave-One-Out Cross-Validation Based Regularized Extreme Learning Machine (ELOO-RELM) is also proposed. Both ELOO-ELM and ELOO-RELM utilize LOO-CV error as the selection criterion, and thus the stability can be guaranteed. They all employ a highly efficient formula to calculate the LOO-CV error. Thus, the speed advantage of the ELM can be retained. The complexity of these two algorithms scales linearly with the size of training samples and is very similar to the original ELM. Therefore, they are able to deal with a large amount of data. To find the optimal ridge parameter in a more straightforward manner instead of being searched and compared, Automatic Regularized ELM (AR-ELM) is also introduced. It can achieve even faster learning speed than ELOO-RELM without providing model candidates beforehand, but the stability may not be guaranteed because of the adaptation of the Lawless and Wang formula. The proposed ELOO-ELM, ELOO-RELM, and AR-ELM are all batch learning algorithms. Therefore, the feasibility of extending them to online scenarios is also discussed. Also, a regularized version of the OS-ELM is proposed.