Divisible load scheduling algorithm in a virtual distributed computing system
Date of Issue2016-06-03
School of Computer Engineering
Divisible Load Theory (DLT) was introduced to resolve scheduling problems in a Distributed Computing System (DCS). A divisible load (job) is one that is arbitrarily divided into a number of fractions among the processors and links in a system, and each load fraction is processed independently. The main objective of DLT is to optimize the scheduling process with a minimum processing time for the entire load. Recent advancements in information technology have lead to a Virtual Distributed Computing System (VDCS) with dynamic resources and multiple data banks with a loosely connected structure. This thesis addresses the problem of load scheduling in a VDCS with dynamic resources and multiple data banks under a DLT framework. We first propose a DLT solution to solve the problem of optimal load scheduling with fixed resources. The optimal load scheduling with fixed resources is formulated as a linear programming problem with total processing time, release time, continuity and total load constraint equations. The proposed DLT solution for load scheduling in VDCS with fixed resources is studied using four numerical examples. Next, we analyze the effect of processing the load in heterogeneous and homogeneous environments. Finally, we present an experimental study by solving a satellite image processing problem in a compute cloud environment. From the experimental study, we can see that the result is close to the analytical processing time obtained using the proposed load scheduling algorithm. Based on the proposed DLT solution for a VDCS with fixed resources, we extend the framework to address the dynamic resource allocation problem. The release time of the worker role and the modified load distribution sequence are additional constraints in a dynamic resource handling environment. Therefore, we formulate the problem of dynamic resource handling as a linear programming problem with these additional constraints. We propose a rescheduling strategy for worker role release time and modified load distribution sequence. Using five numerical examples, we show that the DLT-based scheduling algorithm with the new rescheduling strategy can accomplish the load processing tasks in a dynamic resource environment. Another important problem with load scheduling is the idle time of resources during each process, which is usually reduced through multiple installments. In this thesis, we propose a method to solve the load scheduling problem with multiple installments using genetic algorithms. First, we present a hybrid genetic algorithm approach for load scheduling in VDCS with a single data bank. Next, we extend it to multiple data banks. The hybrid genetic algorithm integrates the equality constraint during optimization and hence provides better convergence compared to other search-based algorithms. A mean and standard deviation reduction of greater than half is observed using the hybrid genetic algorithm compared with a general real-coded genetic algorithm and a particle swarm optimization algorithm. In the future, we plan to implement the proposed DLT-based scheduling algorithm as a load manager in a real VDCS with multiple data banks. Further, the DLT-based scheduling algorithm will be extended from the single-installment case to a multi-installment case in VDCS with multiple data banks. Moreover, the task queue will be explored by considering the task with different levels of priorities.