Cross-cloud MapReduce for Big Data

Cross-cloud MapReduce for Big Data


MapReduce plays a critical role as a leading framework for big data analytics. In this paper, we consider a geo-distributed cloud architecture that provides MapReduce services based on the big data collected from end users all over the world. Existing work handles MapReduce jobs by a traditional computation-centric approach that all input data distributed in multiple clouds are aggregated to a virtual cluster that resides in a single cloud. Its poor efficiency and high cost for big data support motivate us to propose a novel data-centric architecture with three key techniques, namely, cross-cloud virtual cluster, data-centric job placement, and network coding based traffic routing. Our design leads to an optimization framework with the objective of minimizing both computation and transmission cost for running a set of MapReduce jobs in geo-distributed clouds. We further design a parallel algorithm by decomposing the original large-scale problem into several distributively solvable sub-problems that are coordinated by a high-level master problem. Finally, we conduct real-world experiments and extensive simulations to show that our proposal significantly outperforms the existing works.



  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram :1 GB


  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL


Peng Li, Member, IEEE, Song Guo, Senior Member, IEEE, Shui Yu, Member, IEEE, and Weihua Zhuang, Fellow, IEEE, “Cross-cloud MapReduce for Big Data”, IEEE Transactions on Cloud Computing, 2019.

About the Author