Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters
The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose simple yet effective schemes which use slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our schemes dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented schemes in Hadoop V0.20.2 and evaluated them with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our schemes under both simple workloads and more complex mixed workloads.
A classic Hadoop cluster includes a single master node and multiple slave nodes. The master node runs the JobTracker routine which is responsible for scheduling jobs and coordinating the execution of tasks of each job. Each slave node runs the TaskTracker daemon for hosting the execution of MapReduce jobs. The concept of “slot” is used to indicate the capacity of accommodating tasks on each node. In a Hadoop system, a slot is assigned as a map slot or a reduce slot serving map tasks or reduce tasks, respectively.
At any given time, only one task can be running per slot. The number of available slots per node indeed provides the maximum degree of parallelization in Hadoop. Our experiments have shown that the slot configuration has a significant impact on system performance. The Hadoop framework, however, uses fixed numbers of map slots and reduce slots at each node as the default setting throughout the lifetime of a cluster. The values in this fixed configuration are usually heuristic numbers without considering job characteristics. Therefore, this static setting is not well optimized and may hinder the performance improvement of the entire cluster.
Quincy addressed the scheduling problem with locality and fairness constraints.
Zaharia et al. proposed a delay scheduling to further improve the performance of the Fair scheduler by increasing data locality.
Verma et al. introduced a heuristic to minimize the makespan of a set of independent MapReduce jobs by applying the classic Johnson’s algorithm.
DISADVANTAGES OF EXISTING SYSTEM:
A fixed slot configuration may lead to low resource utilizations and poor performance especially when the system is processing varying workloads.
Their techniques are still based on static slot configurations, i.e., having a fixed number of map slots and reduce slots per node throughout the lifetime of a cluster.
In this paper, we aim to develop algorithms for adjusting a basic system parameter with the goal to improve the performance (i.e., reduce the makespan) of a batch of MapReduce jobs.
In this work, we propose and implement a new mechanism to dynamically allocate slots for map and reduce tasks. The primary goal of the new mechanism is to improve the completion time (i.e., the makespan) of a batch of MapReduce jobs while retain the simplicity in implementation and management of the slot-based Hadoop design. The key idea of this new mechanism, named TuMM, is to automate the slot assignment ratio between map and reduce tasks in a cluster as a tunable knob for reducing the makespan of MapReduce jobs.
The Workload Monitor (WM) and the Slot Assigner (SA) are the two major components introduced by TuMM. The WM that resides in the JobTracker periodically collects the execution time information of recently finished tasks and estimates the present map and reduce workloads in the cluster. The SA module takes the estimation to decide and adjust the slot ratio between map and reduce tasks for each slave node.
With TuMM, the map and reduce phases of jobs could be better pipelined under priority based schedulers, and thus the makespan is reduced. We further investigate the dynamic slot assignments in heterogeneous environments, and propose a new version of TuMM, named H TuMM, which sets the slot configurations for each individual node to reduce the makespan of a batch of jobs.
ADVANTAGES OF PROPOSED SYSTEM:
Reduces the makespan of multiple jobs by separately setting the slot assignments for the node in a heterogeneous cluster.
The experimental results demonstrate up to 28% reduction in the makespans and 20% increase in resource utilizations. The effectiveness and the robustness of our new slot management schemes are validated under both homogeneous and heterogeneous cluster environments.
Minimize the completion time of two phases.
System : Pentium IV 2.4 GHz.
Hard Disk : 40 GB.
Floppy Drive : 44 Mb.
Monitor : 15 VGA Colour.
Ram : 512 Mb.
Operating system : Windows 7/UBUNTU.
Coding Language : Java 1.7 ,Hadoop 0.8.1
IDE : Eclipse
Database : MYSQL
Yi Yao, Jiayin Wang, Bo Sheng, Chiu C. Tan, Ningfang Mi, “Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters”, IEEE Transactions on Cloud Computing 2015.