Statistical and data mining approach for the prediction of solar radiation
Date of Issue2013
School of Electrical and Electronic Engineering
This thesis was initially proposed as part of Singapore National Research Foundation-Competitive Research Program (NRF-CRP) project entitled: “Combined Cycle Solar Energy Self-sustaining Membrane Distillation (MD) and Membrane Distillation Bioreactor (MDBR) Water Production and Recycling System” in 2009. As the proposed bioreactor is very sensitive to temperature changes, it is crucial to keep its temperature stable. As most of the energy is provided by a set of solar panels, ability to predict the solar radiation is therefore crucial in maintaining a stable temperature. Hence, there is a need to develop a system for accurate and consistent prediction of solar radiation. We first present our research work on some fundamental issues encountered in solar radiation time series prediction. We examine how statistical models and data mining approaches can be used to conduct the prediction of solar time series. Our purpose is to find an approach which can model the complex nonlinear relationship lying in the time series data, effectively remove the noises and outliers and, on the other hand, provide us accurate and consistent prediction. Auto regressive integrate moving average model (ARIMA), is a widely studied time series prediction approach. It has a very strong foundation in statistics. It gains great popularity because it can be used to model different kinds of time series with properly identified order. Besides, there is a widely accepted methodology to develop a model for a specific time series. However, the disadvantage of ARIMA is its inability to model time series that are nonlinear. Time delay neural network (TDNN) which is developed based on artificial neural network (ANN), is also studied in this thesis. TDNN is capable to capture the nonlinear relationship in the data set. But just like other data driven algorithms, it has the over-fitting problem. And when there are lots of noise data or outliers in the training data set, TDNN may also yield gross mistake. As ARIMA and TDNN both have their own advantages, we propose a hybrid model which tries to combine them. This model uses ARIMA to model the linear component of time series and TDNN is used to model the nonlinear component. We also use a novel detrending method to generate stationary series for ARIMA rather than the traditional differencing method. As we use the solar radiation time series in our experiment, several meteorology models are used as detrending model. Experimental result shows that our proposed hybrid model outperforms either ARIMA or TDNN. To better improve the prediction performance of time series, we propose a novel multi-model framework, or MMF. In this framework, we assume that there are several different patterns occur repeatedly in the time series. Our purpose is to develop prediction model for every pattern and using proper model to predict the future value during the prediction phase. It is therefore necessary to segment the time series and then group the subsequences into different clusters. Initially we adopted a fixed length segmentation schema and find the optimal length for the subsequence through cross validation experiment. As TDNN is proved to be able to model nonlinear relationship of solar radiation, it is adopted as the prediction model. When predicting the future value of the time series, the pattern that the current time series belongs to is firstly identified. After that, the testing data is fed to the chosen model to conduct the prediction. The experimental result shows that the proposed MMF presents better prediction performance than other models. Next, we sought to improve the clustering. Genetic algorithm and multi model framework or GAMMF is developed by combining genetic algorithm with MMF. We use a dynamic segmentation schema in GAMMF instead of the fixed length segmentation in MMF. To find the optimal segmentation schema, genetic algorithm is used to combine with K-means clustering algorithm. This segmentation schema is supposed to be able to achieve better clustering performance. Then TDNN is used to model different patterns. Support vector regression (SVR) is also used along with TDNN and serves as an additional prediction model. When none of the pattern is appropriate to describe the current time series, the SVR model will be used to conduct the prediction. The experiment result proves that GAMMF outperforms other prediction algorithms in both accuracy and consistency.
DRNTU::Engineering::Computer science and engineering::Mathematics of computing