H2Hadoop: Improving Hadoop Performance using the Metadata of Related Jobs
Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. These limitations are mostly because of data locality in the cluster, jobs and tasks scheduling, and resource allocations in Hadoop. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose H2Hadoop, which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis. The proposed architecture also addresses the issue of resource allocation in native Hadoop. H2Hadoop provides a better solution for “text data”, such as finding DNA sequence and the motif of a DNA sequence. Also, H2Hadoop provides an efficient Data Mining approach for Cloud Computing environments. H2Hadoop architecture leverages on NameNode’s ability to assign jobs to the TaskTrakers (DataNodes) within the cluster. By adding control features to the NameNode, H2Hadoop can intelligently direct and assign tasks to the DataNodes that contain the required data without sending the job to the whole cluster. Comparing with native Hadoop, H2Hadoop reduces CPU time, number of read operations, and another Hadoop factors.