Handling Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique
The increased use of cyber-enabled systems and Internet-of-Things (IoT) led to a massive amount of data with different structures. Most big data solutions are built on top of the Hadoop eco-system or use its distributed file system (HDFS). However, studies have shown inefficiency in such systems when dealing with today’s data. Some research overcame these problems for specific types of graph data, but today’s data are more than one type of data. Such efficiency issues lead to large scale problems, including larger space required in data centers, and waste in resources (like power consumption), that in turn lead to environmental problems (such as more carbon emission), as per scholars. We propose a data-aware module for the Hadoop eco-system. We also propose a distributed encoding technique for Genetic Algorithms. Our framework allows Hadoop to manage the distribution of data and its placement based on cluster analysis of the data itself. We are able to handle a broad range of data types as well as optimize query time and resource usage. We performed our experiments on multiple datasets generated via LUBM.
- System : i3 Processor
- Hard Disk : 500 GB.
- Monitor : 15’’ LED
- Input Devices : Keyboard, Mouse
- Ram :1 gb
- Operating system : Windows 7/UBUNTU.
- Coding Language : Java 1.7 ,Hadoop 0.8.1
- IDE : Eclipse
- Database : MYSQL
Mustafa Hajeer, Member, IEEE, and Dipankar Dasgupta, Fellow, IEEE, “Handling Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique”, IEEE Transactions on Big Data, 2019.