Service Rating Prediction by Exploring Social Mobile Users’ Geographical Locations

Service Rating Prediction by Exploring Social Mobile Users’ Geographical Locations

Service Rating Prediction by Exploring Social Mobile Users’ Geographical Locations

 

ABSTRACT:

Recently, advances in intelligent mobile device and positioning techniques have fundamentally enhanced socialnetworks, which allows users to share their experiences, reviews, ratings, photos, check-ins, etc. The geographical informationlocated by smart phone bridges the gap between physical and digital worlds. Location data functions as the connection betweenuser’s physical behaviors and virtual social networks structured by the smart phone or web services. We refer to these socialnetworks involving geographical information as location-based social networks (LBSNs). Such information brings opportunitiesand challenges for recommender systems to solve the cold start, sparsity problem of datasets and rating prediction. In thispaper, we make full use of the mobile users’ location sensitive characteristics to carry out rating predication. We mine: 1) therelevance between user’s ratings and user-item geographical location distances, called as user-item geographical connection,2) the relevance between users’ rating differences and user-user geographical location distances, called as user-usergeographical connection. It is discovered that humans’ rating behaviors are affected by geographical location significantly.Moreover, three factors: user-item geographical connection, user-user geographical connection, and interpersonal interestsimilarity, are fused into a unified rating prediction model. We conduct a series of experiments on a real social rating networkdataset Yelp. Experimental results demonstrate that the proposed approach outperforms existing models.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Fortunately, with the popularityand rapid development of social networks, moreand more users enjoy sharing their experiences, reviews,ratings, photos, and moods with their friends. Many social-based modelshave been proposed toimprove the performance of recommender system.
  • Yanget al. propose to use the concept of ‘inferred trust circle’based on the domain-obvious of circles of friends onsocial networks to recommend users favorite items.
  • Jianget al. prove that individual preference is also an importantfactor in social networks. In their Context Model,user latent features should be similar to his/her friends’according to preference similarity.
  • Hu et al. and Lei etal.utilize the power of semantic knowledge bases tohandle textual messages and recommendations.

DISADVANTAGES OF EXISTING SYSTEM:

  • The first generation of recommender systems with traditional collaborative filtering algorithms is facing great challenges of cold start for users (new users in the recommender system with little historical records) and the sparsity of datasets.
  • They perform biases based matrix factorization model

PROPOSED SYSTEM:

  • The goals of this paper are:
  • To mine the relevance between user’s ratings and useritemgeographical location distances, called as user-itemgeographical connection,
  • To mine the relevance betweenusers’ rating differences and user-user geographicallocation distances, called as user-user geographical connection, and
  • To find the people whose interest issimilar to users. In this paper, three factors are taken intoconsideration for rating prediction: user-item geographicalconnection, user-user geographical connection, andinterpersonal interest similarity. These factors are fusedinto a location based rating prediction model.
  • The noveltiesof this paper are user-item and user-user geographicalconnections, i.e. we explore users’ rating behaviorsthrough their geographical location distances. The maincontributions of this paper are summarized as follows:
  • We mine the relevance between ratings and useritemgeographical location distances. It is discovered that users usually give high scores to theitems (or services) which are very far away fromtheir activity centers. It can help us to understandusers’ rating behaviors for recommendation.

ADVANTAGES OF PROPOSED SYSTEM:

  • We mine the relevance between users’ rating differences and user-user geographical distances. It is discovered that users and their geographically far away friends usually give the similar scores to the same item. It can help us to understand users’ rating behaviors for recommendation.
  • We integrate three factors: user-item geographical connection, user-user geographical connection, and interpersonal interest similarity, into a Location Based Rating Prediction (LBRP) model.
  • The proposed model is evaluated by extensive experiments based on Yelp dataset. Experimental results show significant improvement compared with existing approaches.

SYSTEM ARCHITECTURE:

Service Rating Prediction by Exploring SocialMobile Users Geographical Locations

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram : 4GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Guoshuai Zhao, Xueming Qian, Member, IEEE, Chen Kang, “Service Rating Prediction by Exploring SocialMobile Users’ Geographical Locations”, IEEE TRANSACTIONS ON BIG DATA 2017.

Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters

Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters

 

ABSTRACT:

The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalableanalysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e.,makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slotsand reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low systemresource utilizations as well as long completion length. Motivated by this, we propose simple yet effective schemes which use slot ratiobetween map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload informationof recently completed jobs, our schemes dynamically allocates resources (or slots) to map and reduce tasks. We implementedthe presented schemes in Hadoop V0.20.2 and evaluated them with representative MapReduce benchmarks at Amazon EC2. Theexperimental results demonstrate the effectiveness and robustness of our schemes under both simple workloads and more complexmixed workloads.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • A classic Hadoop cluster includes a single master nodeand multiple slave nodes. The master node runs theJobTracker routine which is responsible for schedulingjobs and coordinating the execution of tasks of each job.Each slave node runs the TaskTracker daemon for hostingthe execution of MapReduce jobs. The concept of “slot”is used to indicate the capacity of accommodating taskson each node. In a Hadoop system, a slot is assignedas a map slot or a reduce slot serving map tasks orreduce tasks, respectively.
  • At any given time, only onetask can be running per slot. The number of availableslots per node indeed provides the maximum degree ofparallelization in Hadoop. Our experiments have shownthat the slot configuration has a significant impact onsystem performance. The Hadoopframework, however,uses fixed numbers of map slots and reduce slots at eachnode as the default setting throughout the lifetime of acluster. The values in this fixed configuration are usuallyheuristic numbers without considering job characteristics.Therefore, this static setting is not well optimizedand may hinder the performance improvement of theentire cluster.
  • Quincy addressed the scheduling problem with locality and fairness constraints.
  • Zaharia et al. proposed a delay scheduling to further improve the performance of the Fair scheduler by increasing data locality.
  • Verma et al. introduced a heuristic to minimize the makespan of a set of independent MapReduce jobs by applying the classic Johnson’s algorithm.

DISADVANTAGES OF EXISTING SYSTEM:

  • A fixed slot configuration may lead to low resource utilizations and poor performance especially when the system is processing varying workloads.
  • Their techniques are still based on static slot configurations, i.e., having a fixed number of map slots and reduce slots per node throughout the lifetime of a cluster.

PROPOSED SYSTEM:

  • In this paper, we aim to develop algorithms for adjusting a basic system parameter with the goal to improve the performance (i.e., reduce the makespan) of a batch of MapReduce jobs.
  • In this work, we propose and implement a new mechanism to dynamically allocate slots for map and reduce tasks. The primary goal of the new mechanism is to improve the completion time (i.e., the makespan) of a batch of MapReduce jobs while retain the simplicity in implementation and management of the slot-basedHadoop design. The key idea of this new mechanism,named TuMM, is to automate the slot assignment ratiobetween map and reduce tasks in a cluster as a tunableknob for reducing the makespan of MapReduce jobs.
  • TheWorkload Monitor (WM) and the Slot Assigner (SA) arethe two major components introduced by TuMM. TheWM that resides in the JobTracker periodically collectsthe execution time information of recently finished tasksand estimates the present map and reduce workloadsin the cluster. The SA module takes the estimation todecide and adjust the slot ratio between map and reducetasks for each slave node.
  • With TuMM, the map andreduce phases of jobs could be better pipelined underpriority based schedulers, and thus the makespan isreduced. We further investigate the dynamic slot assignmentsin heterogeneous environments, and propose anew version of TuMM, named H TuMM, which sets theslot configurations for each individual node to reducethe makespan of a batch of jobs.

ADVANTAGES OF PROPOSED SYSTEM:

  • Reduces the makespan of multiple jobs by separately setting the slot assignments for the node in a heterogeneous cluster.
  • The experimental results demonstrate up to 28% reduction in the makespans and 20% increase in resource utilizations. The effectiveness and the robustness of our new slot management schemes are validated under both homogeneous and heterogeneous cluster environments.
  • Minimize the completion time of two phases.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse : Logitech
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Yi Yao,Jiayin Wang, Bo Sheng, Chiu C. Tan,NingfangMi, “Self-Adjusting Slot Configurations forHomogeneous and Heterogeneous HadoopClusters”, IEEE Transactions on Cloud Computing 2017.

Scalable Uncertainty-Aware Truth Discovery in Big Data Social Sensing Applications for Cyber-Physical Systems

Scalable Uncertainty-Aware Truth Discovery in Big Data Social Sensing Applications for Cyber-Physical Systems

Scalable Uncertainty-Aware Truth Discovery in Big Data Social Sensing Applications for Cyber-Physical Systems

 

ABSTRACT:

Social sensing is a new big data application paradigm forCyber-Physical Systems (CPS), where a group of individuals volunteer(or are recruited) to report measurements or observations about thephysical world at scale. A fundamental challenge in social sensingapplications lies in discovering the correctness of reported observationsand reliability of data sources without prior knowledge on either of them.We refer to this problem as truth discovery. While prior studies havemade progress on addressing this challenge, two important limitationsexist: (i) current solutions did not fully explore the uncertainty aspectof human reported data, which leads to sub-optimal truth discoveryresults; (ii) current truth discovery solutions are mostly designed assequential algorithms that do not scale well to large-scale social sensingevents. In this paper, we develop a Scalable Uncertainty-Aware TruthDiscovery (SUTD) scheme to address the above limitations. The SUTDscheme solves a constraint estimation problem to jointly estimate thecorrectness of reported data and the reliability of data sources whileexplicitly considering the uncertainty on the reported data. To addressthe scalability challenge, the SUTD is designed to run a Graphic ProcessingUnit (GPU) with thousands of cores, which is shown to run twoto three orders of magnitude faster than the sequential truth discoverysolutions. In evaluation, we compare our SUTD scheme to the state-of-the-art solutions using three real world datasets collected from Twitter:Paris Attack, Oregon Shooting, and Baltimore Riots, all in 2015. Theevaluation results show that our new scheme significantly outperformsthe baselines in terms of both truth discovery accuracy and executiontime.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Maximum Likelihood Estimation (MLE) approach iscommonly used in cyber-physical systems and sensornetworks for various estimation and data fusiontasks. For example, A MLE based locationestimation scheme has been developed to locate multiplesources based on acoustic signal measurementsfrom individual sensors.
  • Xiao et al. presented adistributed consensus based MLE approach to computethe unknown parameters of sensory measurements corruptedby Gaussian noise. The MLE framework hasalso been applied to address clock synchronization,target tracking, and compressive sensing inWSN. However, the estimation variables in the aboveworks are mainly continuous variables that representmeasurements of physical sensors.

DISADVANTAGES OF EXISTING SYSTEM:

  • Current solutions did not fully explore the uncertainty aspect of the claims generated by human sensors and assumedall claims are affirmative. However, such assumptiondoes not hold in real world social sensing applications.
  • Current truth discovery solutions are mostly designed as sequential algorithms that cannot easily run on parallel computing platforms (e.g., cloud, GPU). Such scalability deficiency greatly limits the application of current truth discovery solutions in large-scale social sensing events
  • It is challenging to model and quantify the degrees of uncertainty human sensors express in their claims and incorporate such uncertainty feature into a rigorous truth discovery solution.
  • It is not a simple task to accurately assess the quality of the truth discovery results without knowing the ground truth information on either source reliability or claim correctness.
  • It is nontrivial to design a parallel truth discovery solution that can run much faster than its sequential counterpart without sacrificing the truth discovery accuracy.

PROPOSED SYSTEM:

  • This paper presents a scalable uncertainty-aware estimation approach to address the truth discovery problem in social sensing applications for Cyber-Phyiscal Systems (CPS).
  • This paper develops a Scalable Uncertainty-Aware Truth Discovery (SUTD) scheme. The SUTD scheme solves a constraint estimation problem to jointly estimate the correctness of reported data and the reliability of data sources while explicitly exploring the uncertainty feature of claims.
  • Rigorous confidence bounds have been derived to assess the quality of the truth discovery results output by SUTD scheme using the well-grounded results from estimation theory.
  • In this paper, we primarily focus on the disaster and emergency response scenarios since the amount of factual and verifiable information is more significant compared to other social events (e.g., presidential election, protests).

ADVANTAGES OF PROPOSED SYSTEM:

  • The evaluation results demonstratethat our new scheme significantly improves both truthdiscovery accuracy and execution time compared to the baselines.
  • We explicitly address the uncertainty and scalabilitychallenges of the truth discovery problem in socialsensing.
  • We developed a new analytical framework SUTDthat solves the uncertainty-aware truth discoveryproblem using an estimation theoretical approach inthe context of big data social sensing applications.
  • We implemented a parallel SUTD scheme on a GPUthat was shown to run a few orders of magnitudefaster than the sequential truth discovery solutions.
  • We evaluated the performance of the SUTD schemeusing three real world datasets collected from recentevents. The evaluation results demonstrate the significant performance gain achieved by our schemecompared to other baselines

SYSTEM ARCHITECTURE:

Scalable Uncertainty Aware Truth Discovery inBig Data

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram : 4GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Chao Huang, Dong Wang, NiteshChawla, “Scalable Uncertainty-Aware Truth Discovery inBig Data Social Sensing Applications forCyber-Physical Systems”, IEEE Transactions on Big Data, 2017.

Robust Big Data Analytics for Electricity Price Forecasting in the Smart Grid

Robust Big Data Analytics for Electricity Price Forecasting in the Smart Grid

Robust Big Data Analytics for Electricity Price Forecasting in the Smart Grid

 

ABSTRACT:

Electricity price forecasting is a significant part of smart grid because it makes smart grid cost efficient. Nevertheless,existing methods for price forecasting may be difficult to handle with huge price data in the grid, since the redundancy from featureselection cannot be averted and an integrated infrastructure is also lacked for coordinating the procedures in electricity priceforecasting. To solve such a problem, a novel electricity price forecasting model is developed. Specifically, three modules are integratedin the proposed model. First, by merging of Random Forest (RF) and Relief-F algorithm, we propose a hybrid feature selector based onGrey Correlation Analysis (GCA) to eliminate the feature redundancy. Second, an integration of Kernel function and PrincipleComponent Analysis (KPCA) is used in feature extraction process to realize the dimensionality reduction. Finally, to forecast priceclassification, we put forward a differential evolution (DE) based Support Vector Machine (SVM) classifier. Our proposed electricityprice forecasting model is realized via these three parts. Numerical results show that our proposal has superior performance than othermethods.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Varshney et al. developed a hybrid model to predict day ahead electricity market according to temperature and load information, with the utilization of neural network structure and analysis of singular spectrum.
  • Mousavian et al. put forward a probabilistic methodology to forecast per hour electricity price, where the bootstrapping technology is utilized for studying uncertainty and a generalized extreme learning machine method is proposed for wavelet neural networks.
  • Kobayashi et al. developed a switched Markov chain model for solving optimal electricity pricing problem in realtime based on a welfare function, which considers a tradeoff between users’ utility and power conservation.
  • Mosbah et al. used multilayer neural networks in composite topology to enhance per hour electricity price forecasting accuracy.
  • Previous studies mainly focus on feature selection algorithms or classifiers design, where traditional classifiers, e.g., Decision Tree (DT) and Artificial Neural Network (ANN) are very popular

DISADVANTAGES OF EXISTING SYSTEM:

  • Decision Tree usually faces theoverfit problem, which means the DT performs well intraining but not in prediction.
  • Artificial Neural Network has a limitedgeneralization capability and its convergence cannot beeasily controlled.
  • Also, these learning based methods donot take the big data into consideration, and the evaluationof performance is only conducted on the price data, which isnot quite large. Hence, the price forecasting accuracy couldstill be improved with the help of big data.

PROPOSED SYSTEM:

  • In this paper, we investigate the electricity price forecastingissue. Our objective is to predict the electricity price accuratelyby using the big data from grid. To overcome this challenging obstacle, we propose a Support Vector Machine (SVM) underpinned framework that can predict the price efficiently.
  • SVM is a classifier that try to find a hyperplane which can divide data into the correct classes. The support vector is a part of data that could help to determine the hyperplane.
  • We propose a parallelized electricity forecasting framework, called Hybrid Selection, Extraction and Classification (HSEC). The three components of HSEC are parallelized Hybrid Feature Selector (HFS) based on Grey Correlation Analysis (GCA), feature extraction process based on Kernel Principle Component Analysis (KPCA) and differential evolution (DE) based SVM classifier.
  • The HSEC performs feature engineering by selecting features corresponding to the time sequence and the dimensional reduction of electricity price data features. The HFS uses the fusion of two feature selectors based on GCA rather than one to give an appropriate selection of features.

ADVANTAGES OF PROPOSED SYSTEM:

  • We propose an integrated electricity price forecastingframework to make accurate big data forecasting insmart grid. To the best of our knowledge, it is the firstattempt in this paper that feature selection, extractionand classification are integrated in this frameworkdesign for the studied problem.
  • To achieve this framework, we first propose a GCAbasedHFS, combining Relief-F algorithm and RandomForest (RF), to calculate the feature importanceand control the feature selection. For feature extraction,we use KPCA to further reduce the redundancyamong the selected features.
  • We are the first to studythe redundancy among the selected features in theelectricity price forecasting field. We also design aDE-SVM algorithm to tune the super parametersof SVM, which has a higher accuracy than existingclassifiers.
  • The performance of our proposal is evaluated byseveral extensive simulations that based real worlddata traces of grid price and workload. The numericalresults show that our proposal has betterperformance than benchmark approaches.

SYSTEM ARCHITECTURE:

Robust Big Data Analytics for Electricity PriceForecasting in the Smart Grid

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram : 4GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Kun Wang, Member, IEEE, ChenhanXu, Yan Zhang, Senior Member, IEEE, Song Guo, SeniorMember, IEEE, and Albert Y. Zomaya, Fellow, IEEE, “Robust Big Data Analytics for Electricity PriceForecasting in the Smart Grid”, IEEE Transactions on Big Data, 2017.

Ring: Real-Time Emerging Anomaly Monitoring System over Text Streams

Ring: Real-Time Emerging Anomaly Monitoring System over Text Streams

Ring: Real-Time Emerging Anomaly MonitoringSystem over Text Streams

 

ABSTRACT:

Microblog platforms have been extremely popular in the big data era due to its real-time diffusion of information. It’simportant to know what anomalous events are trending on the social network and be able to monitor their evolution and find relatedanomalies. In this paper we demonstrate RING, a real-time emerging anomaly monitoring system over microblog text streams. RINGintegrates our efforts on both emerging anomaly monitoring research and system research. From the anomaly monitoring perspective,RING proposes a graph analytic approach such that (1) RING is able to detect emerging anomalies at an earlier stage compared to theexisting methods, (2) RING is among the first to discover emerging anomalies correlations in a streaming fashion, (3) RING is able tomonitor anomaly evolutions in real-time at different time scales from minutes to months. From the system research perspective, RING(1) optimizes time-ranged keyword query performance of a full-text search engine to improve the efficiency of monitoring anomalyevolution, (2) improves the dynamic graph processing performance of Spark and implements our graph stream model on it, As a result,RING is able to process big data to the entire Weibo or Twitter text stream with linear horizontal scalability. The system clearly presentsits advantages over existing systems and methods from both the event monitoring perspective and the system perspective for theemerging event monitoring task.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • TwitterMonitor provides online detection for general emerging anomalous events but could not reveal multiple aspects of the events nor track the evolution of them.
  • CLEar provides real-time anomalous event detection and tracking but could not provide correlation analysis of them. Neither of them could identify potential anomalous events before they are popular and spreading at scale.
  • Signitrend could detect potential anomalous events at small scale but is confined to only detection, and could not provide tracking and correlation analysis.
  • Cai et al. has indexing system optimization combined with evolution tracking, but does not provide much of the detailed monitoring analysis
  • Existing systems, take a lot of time during update of graph structure. They also cannot rebalance workload when some nodes in the graph are more frequently updated than others, which is a common case when processing popular words in microblog texts.

DISADVANTAGES OF EXISTING SYSTEM:

  • It is slower than LDA, hence cannot be applied in a real-time scenario.
  • The method would also tend to generate anomalous events that only contain a single keyword as description, which is hard to comprehend for users.
  • None of the existing system methods provide the horizontal scalability with distributed implementations of their algorithms, nor do they investigate system optimizations for their applications.

PROPOSED SYSTEM:

  • In this paper, we present RING, a real-time emerging anomaly monitoring system over microblog text streams. Emerging anomaly monitoring has attracted much attention from the research domain. Here we aim to monitor emerging anomalous events on microblog platforms.
  • Our emerging anomaly monitoring methods are based on graph mining techniques, which provides unique opportunities to integrate our emerging anomaly monitoring research and system optimizations. In the RING system, emerging anomaly monitoring includes early detection, correlation analysis and temporal evolution tracking of anomalous events. Early detection would capture emerging events before they go viral. Correlation analysis would automatically reveal multiple aspects of the anomalous event, or the causality of anomalous events, or categorical structure of related anomalies.

ADVANTAGES OF PROPOSED SYSTEM:

  • Such monitoring happens in real-time and provides valuable intelligence for government agencies, news groups and marketing agencies, etc.
  • RING has a distributed graph processing engine specifically optimized for our anomaly detection method. Algorithms are implemented to have linear horizontal scalability to handle big data.
  • The full-text indexing engine is optimized for efficient time range queries, which benefits our evolution tracking algorithms and queries over events and tweets.
  • A user friendly interface is also provided to facilitate the analysis of emerging events with visualization.
  • We adopt anomaly detection method to monitor each keyword for early detection of trends. The proposed graph stream model is fully distributed with an efficient context statistics maintenance strategy and linear scalability.
  • We provide a scalable anomaly monitoring approach meeting all the listed requirements. Especially, we are among the first to provide detailed correlation analysis of anomalies under the real-time emerging anomaly monitoring scenario.
  • RING is among the first system to enjoy such rich set of anomaly monitoring features with dedicated system optimization efforts. The system optimizations of RING greatly improves the performance of emerging event monitoring.

SYSTEM ARCHITECTURE:

Ring Real Time Emerging Anomaly MonitoringSystem

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram :4GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Weiren Yu, Member, IEEE, Jianxin Li, Member, IEEE, Md Zakirul Alam Bhuiyan, Member, IEEE,Richong Zhang, Member, IEEE, and Jinpeng Huai, Member, IEEE, “Ring: Real-Time Emerging Anomaly MonitoringSystem over Text Streams”, IEEE Transactions on Big Data, 2017.

Public Interest Analysis Based on Implicit Feedback of  IPTV Users

Public Interest Analysis Based on Implicit Feedback of  IPTV Users

Public Interest Analysis Based on Implicit Feedback of  IPTV Users

 

ABSTRACT:

Modern information systems make it increasinglyeasy to gain more insight into the public interest, which isbecoming more and more important in diverse public andcorporate activities and processes. The disadvantage of existingresearch that focuses on mining the information from socialnetworks and online communities is that it doesn’t uniformlyrepresent all population groups and that the content can besubjected to self-censoring or curation. In this paper we proposeand describe a framework and a method for estimating publicinterest from the implicit negative feedback collected from theIPTV audience. Our research focuses primarily on the channelchange events and their match with the content informationobtained from closed captions. The presented framework is basedon concept modeling, viewership profiling, and combines theimplicit viewer reactions (channel changes) into an interest score.The proposed framework addresses both above mentioneddisadvantages or concerns. It is able to cover a much broaderpopulation, and it can detect even minor variations in userbehavior. We demonstrate our approach on a largepseudonymized real-world IPTV dataset provided by an ISP, andshow how the results correlate with different trending topics andwith parallel classical long-term population surveys.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • There are various approaches to determining semantic relatedness. Prior work in the field pursued three main directions: comparing text fragments as BoW in vector space, using lexical resources, and using Latent Semantic Analysis (LSA)
  • In existing system the authors describe functions of a system designed for the behavior analysis of e-commerce clients. It enables user identification and client behavior extraction for interacting with web site customers. The presented system carries out an evaluation and rating of opinions. In our scenario the identification is neither desired nor applicable, because the users have pseudonymous IDs. On the other hand, we are analyzing user behavior to discover knowledge. However, domain specifics need to be taken into account, since user behavior on the Internet is different than in the context of watching TV.

DISADVANTAGES OF EXISTING SYSTEM:

  • The collection of data has long been problematic andhas required specialized systems (e.g., customized set-topboxes) deployed with a limited number of users.
  • Such systemshave high deployment and maintenance costs.

PROPOSED SYSTEM:

  • In this paper the focus is on channel change events (CCE) generated by the viewers. CCE data can be represented by a time series vector; it hides a wealth of user behavior information, as each channel change event is motivated by a combination of viewer’s interests and content context.
  • The key challenge addressed in the paper is to demonstrate how the users’ interactions with the IPTV service can be efficiently used to gauge the public interest on a specific topic at a large scale.

ADVANTAGES OF PROPOSED SYSTEM:

  • The proposal of a framework for assessing the user’simplicit positive and negative feedback with respect tothe content being watched, and using the available formsof metadata.
  • Presentation and analysis of a prototype implementationof the described framework. The implemented hybridmethod relies on supervised and unsupervised learningand allows the estimation of the public interest on aparticular topic, and comparison of interest betweentopics.
  • First results are shown for a national Slovenian example,where the used metadata is in the form of closedcaptions. We present long-term interest variations andshort-term to medium-term changes in interest thatcoincide with various important events.
  • The proposedapproach can also be extended in a number of ways toallow more elaborate use cases.

SYSTEM ARCHITECTURE:

Public Interest Analysis Based on ImplicitFeedback of IPTV Users

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

 

  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram : 4GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Matej Kren, Andrej Kos, Senior Member, IEEE, Yuan Zhang, Senior Member, IEEE, Anton Kos,Member, IEEE, and Urban Sedlar, Member, IEEE, “Public Interest Analysis Based on ImplicitFeedback of IPTV Users”, IEEETransactions on Industrial Informatics, 2017.

Practical Privacy-Preserving Map Reduce Based K-means Clustering over Large-scale Dataset

Practical Privacy-Preserving Map Reduce Based K-means Clustering over Large-scale Dataset

Practical Privacy-Preserving Map Reduce Based K-means Clustering over Large-scale Dataset

 

ABSTRACT:

Clustering techniques have been widely adopted inmany real world data analysis applications, such as customerbehavior analysis, targeted marketing, digital forensics, etc. Withthe explosion of data in today’s big data era, a major trendto handle a clustering over large-scale datasets is outsourcingit to public cloud platforms. This is because cloud computingoffers not only reliable services with performance guarantees, butalso savings on in-house IT infrastructures. However, as datasetsused for clustering may contain sensitive information, e.g., patienthealth information, commercial data, and behavioral data, etc,directly outsourcing them to public cloud servers inevitably raiseprivacy concerns.

In this paper, we propose a practical privacy-preserving Kmeansclustering scheme that can be efficiently outsourced tocloud servers. Our scheme allows cloud servers to performclustering directly over encrypted datasets, while achieving comparablecomputational complexity and accuracy compared withclusterings over unencrypted ones. We also investigate secureintegration of MapReduce into our scheme, which makes ourscheme extremely suitable for cloud computing environment.Thorough security analysis and numerical analysis carry out theperformance of our scheme in terms of security and efficiency.Experimental evaluation over a 5 million objects dataset furthervalidates the practical performance of our scheme.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • In recent years, a number of schemes have been proposedto outsource clustering tasks in a privacy-preserving manner. Existing distance-preserving data perturbation or datatransformation techniques are adopted to protect the privacyof dataset, while keeping the distance comparison propertyfor clustering purpose.
  • These perturbation based techniquesare very efficient and even achieve the same computationalcost compared to the original clustering algorithm. This isbecause data perturbation based encryption makes the ciphertxthave the same size as the original data, and uses thesame clustering operations in the original clustering algorithm.However, as shown, these data perturbationbased solutions do not provide enough privacy guarantee.
  • Specifically, once adversaries get a small set of unencrypteddata objects in the dataset from background analysis, they willbe able to recover the rest objects

DISADVANTAGES OF EXISTING SYSTEM:

  • Existing multi-party designs always rely on powerful but expensive cryptographic primitives (e.g., secure circuit evaluation, homomorphic encryption and oblivious transfer) to achieve collaborative secure computation among multiple parties, and are inefficient for large-scale datasets. Thus, these multi-party designs are not practical for privacy-preserving outsourcing of clustering.
  • Utilizing data perturbation and data transformation for privacy-preserving clustering may not achieve enough privacy and accuracy guarantee.
  • The homomophic encryption utilized is not secure.
  • Unfortunately, these privacy-preserving KNN search schemes are limited by the vulnerability to linear analysis attacks, the support up to two dimension data, or accuracy loss

PROPOSED SYSTEM:

  • In this work, we proposed a practical privacy-preserving K-means clustering scheme for large-scale datasets, which can be efficiently outsourced to public cloud servers.
  • Our proposed scheme simultaneously meets the privacy, efficiency, and accuracy requirements as discussed above. In particular, we propose a novel encryption scheme based on the Learn with Error (LWE) hard problem, which achieves privacy preserving similarity measurement of data objects directly over ciphertexts.
  • Based on our encryption scheme, we further construct the whole K-means clustering process in a privacy preserving manner, in which cloud servers only have access to encrypted datasets, and will perform all operations without any decryption.

ADVANTAGES OF PROPOSED SYSTEM:

  • We provide thorough analysis for our scheme in terms of security and efficiency. Our extensive evaluation results show that our privacy-preserving clustering is efficient, scalable, and accurate. Specifically, compared with the K-means clustering over unencrypted datasets, our scheme achieves the same accuracy as well as comparable computational performance and scalability.
  • Moreover, we uniquely incorporate MapReduce into our scheme with privacy protection, and thus significantly improving the clustering performance in cloud computing environment.

SYSTEM ARCHITECTURE:

Practical Privacy Preserving MapReduce

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram : 4GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Jiawei Yuan, Member, IEEE, YifanTian, Student Member, IEEE, “Practical Privacy-Preserving MapReduce BasedK-means Clustering over Large-scale Dataset”, IEEE Transactions on Cloud Computing, 2017.

PPHOPCM: Privacy-preserving High-orderPossibilistic c-Means Algorithm for Big DataClustering with Cloud Computing

PPHOPCM: Privacy-preserving High-orderPossibilistic c-Means Algorithm for Big DataClustering with Cloud Computing

PPHOPCM: Privacy-preserving High-orderPossibilistic c-Means Algorithm for Big DataClustering with Cloud Computing

 

ABSTRACT:

As one important technique of fuzzy clustering in data mining and pattern recognition, the possibilistic c-means algorithm(PCM) has been widely used in image analysis and knowledge discovery. However, it is difficult for PCM to produce a good result forclustering big data, especially for heterogenous data, since it is initially designed for only small structured dataset. To tackle thisproblem, the paper proposes a high-order PCM algorithm (HOPCM) for big data clustering by optimizing the objective function in thetensor space. Further, we design a distributed HOPCM method based on MapReduce for very large amounts of heterogeneous data.Finally, we devise a privacy-preserving HOPCM algorithm (PPHOPCM) to protect the private data on cloud by applying the BGVencryption scheme to HOPCM, In PPHOPCM, the functions for updating the membership matrix and clustering centers areapproximated as polynomial functions to support the secure computing of the BGV scheme. Experimental results indicate thatPPHOPCM can effectively cluster a large number of heterogeneous data using cloud computing without disclosure of private data.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Gao et al. designed a graph-basedco-clustering algorithm for big data by generalizing their previousimage-text clustering method.
  • Chen et al. designed a nonnegative matrix tri-factorization algorithm to cluster big datasets by capturing the correlation over the multiple modalities.
  • Zhang et al. proposed a high-order clustering algorithm forbig data by using the tensor vector space to model the correlationsover the multiple modalities.

DISADVANTAGES OF EXISTING SYSTEM:

  • It is difficult for themto cluster big data effectively, especially heterogeneous data,due to the following two reasons.
  • First, they concatenate thefeatures from different modalities linearly and ignore the complexcorrelations hidden in the heterogeneous data sets, so they are notable to produce desired results.
  • Second, they often have a hightime complexity, making them only applicable to small data sets.Thus, they cannot cluster large amounts of heterogeneous dataefficiently.

PROPOSED SYSTEM:

  • This paper proposes a privacypreservinghigh-order PCM scheme (PPHOPCM) for big dataclustering. PCM is one important scheme of fuzzy clustering. PCM can reflect the typicality of each object to differentclusters effectively and it is able to avoid the corruption ofnoise in the clustering process. However, PCM cannot beapplied to big data clustering directly since it is initially designedfor the small structured dataset. Specially, it cannot capture thecomplex correlation over multiple modalities of the heterogeneousdata object.
  • The paper proposes a high-order PCM algorithm byextending the conventional PCM algorithm in the tensor space.Tensor is called a multidimensional array in mathematics and it iswidely used to represent heterogenous data in big data analysis andmining.
  • In this paper, the proposed HOPCM algorithm representseach object by using a tensor to reveal the correlation overmultiple modalities of the heterogeneous data object. To increasethe efficiency for clustering big data, we design a distributedHOPCM algorithm based on MapReduce to employ cloud serversto perform the HOPCM algorithm.
  • To protect the private data on cloud, we propose a privacy preserving HOPCM scheme by using the BGV technique that is of high efficiency

ADVANTAGES OF PROPOSED SYSTEM:

  • Resultsdemonstrate that HOPCM outperforms other algorithms in clusteringaccuracy for big data, especially for heterogeneous data.
  • Furthermore, PPHOPCM can use cloud servers cluster big dataefficiently without disclosure of the private data.The conventional PCM algorithm cannot cluster heterogeneousdata. Aiming at this problem, the paper proposesa high-order PCM algorithm by optimizing the objectivefunction in the high-order tensor space for heterogeneousdata clustering.
  • To employ cloud servers to improve the clustering efficiency,we design a distributed high-order possibilisticalgorithm based on MapReduce.
  • To protect the sensitive data when performing HOPCMon the cloud platform, we develop a privacy-preservinghigh-order possibilistic c-means scheme by using the BGVencryption method.

SYSTEM ARCHITECTURE:

PPHOPCM Privacy preserving High-orderPossibilistic c Means Algorithm for Big DataClustering with Cloud Computing

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram : 4GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Qingchen Zhang, Laurence T. Yang, Zhikui Chen, and Peng Li, “PPHOPCM: Privacy-preserving High-orderPossibilistic c-Means Algorithm for Big DataClustering with Cloud Computing”, IEEE TRANSACTIONS ON BIG DATA, 2017.

Mining Human Activity Patterns from SmartHome Big Data for Healthcare Applications

Mining Human Activity Patterns from SmartHome Big Data for Healthcare Applications

Mining Human Activity Patterns from SmartHome Big Data for Healthcare Applications

 

ABSTRACT:

Nowadays, there is an ever-increasing migration of people to urban areas. Health care services is one of the mostchallenging aspects that is greatly affected by the vast influx of people to city centers. Consequently, cities around the world areinvesting heavily in digital transformation in an effort to provide healthier ecosystem for people. In such transformation, millions ofhomes are being equipped with smart devices (e.g. smart meters, sensors etc.) which generate massive volumes of fine-grained andindexical data that can be analyzed to support smart city services. In this paper, we propose a model that utilizes smart home big dataas a means of learning and discovering human activity patterns for health care applications. We propose the use of frequent patternmining, cluster analysis and prediction to measure and analyze energy usage changes sparked by occupants’ behavior. Since people’shabits are mostly identified by everyday routines, discovering these routines allows us to recognize anomalous activities that mayindicate people’s difficulties in taking care for themselves, such as not preparing food or not using shower/bath. Our work addressesthe need to analyze temporal energy consumption patterns at the appliance level, which is directly related to human activities. For theevaluation of the proposed mechanism, this research uses the UK Domestic Appliance Level Electricity dataset (UK-Dale) – time seriesdata of power consumption collected from 2012 to 2015 with time resolution of six seconds for five houses with 109 appliances fromSouthern England. The data from smart meters are recursively mined in the quantum/data slice of 24 hours, and the results aremaintained across successive mining exercises. The results of identifying human activity patterns from appliance usage are presentedin details in this paper along with accuracy of short and long term predictions.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Existing system approach uses Semi-Markov-Model (SMM) for data training and detecting individual habits and the other approach introduces impulse based method to detect Activity in Daily Living (ADL) which focuses on temporal analysis of activities that happen simultaneously.
  • Similarly, the work proposes human activity detection forwellness monitoring of elderly people using classificationof sensors related to the main activities in the smart home.
  • Smart meters data are also used for activity recognitionusing Non-intrusive Appliance Load Monitoring(NALM)and Dempster-Shafer (D-S) theory of evidence. The studycollects pre-processed data from homes to determine theelectrical appliance usage patterns and then employs machinelearning-based algorithm to isolate the major activitiesinside the home.

DISADVANTAGES OF EXISTING SYSTEM:

  • It is not easy to detect usage dependencies among various appliances when their operation overlap or occur at the same time.
  • Deriving accurate prediction of human activity patterns is influenced by the probabilistic relationships of appliance usage events that have dynamic time intervals.
  • The study does not consider appliance level usage details.
  • This might not be applicable for human activity recognition since specific activities require individual and multiple appliance to appliance and time associations.

PROPOSED SYSTEM:

  • This paper proposes the use of energy data from smart meters installed at homes to unveil important activities of inhabitants. Our study assumes that there are mechanisms in place to protect people’s privacy from being shared or measured for unlawful uses.
  • The proposed model observes and analyzes readings from smart meters to recognize activities and changes in behavior. Disaggregated power consumption readings are directly related to the activities performed at home.
  • This paper proposesfrequent mining and prediction model to measure and analyzeenergy usage changes sparked by occupants’ behavior.The data from smart meters are recursively mined in thequantum/data slice of 24 hours, and the results are maintainedacross successive mining exercises.
  • We also utilizethe Bayesian network, a probabilistic graphical model, topredict the use of multiple appliances and household energyconsumption. The proposed model is capable of short-termpredictions ranging from next hour up to 24 hours and longtermprediction for days, weeks, months, or seasons.

ADVANTAGES OF PROPOSED SYSTEM:

  • We propose a human activity pattern mining model based on appliance usage variations in smart homes. The model which utilizes FP-growth for pattern recognition and k-means clustering algorithms is capable of identifying appliance-to-appliance and appliance-to-time associations through incremental mining of energy consumption data. This is not only important to determine activity routines, but also, when utilized by health care application, is capable of detecting sudden changes of human activities that require attention by a health provider.
  • We apply a Bayesian network for activity prediction based on individual and multiple appliance usage. This is significant for health applications that incorporate reminders for patients to perform certain activities based on historical data. For added accuracy of the system, the prediction model integrates probabilities of appliance-to-appliance and appliance-to time associations, thus recognizing activities that occur in certain patterns more accurately.

SYSTEM ARCHITECTURE:

Mining Human Activity Patterns from SmartHome Big Data for Healthcare Applications

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram ; 4GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

AbdulsalamYassine, Member, IEEE, Shailendra Singh, AtifAlamri Member, IEEE, “Mining Human Activity Patterns from SmartHome Big Data for Healthcare Applications”, IEEE ACCESS, 2017.

Hadoop Map Reduce for Mobile Clouds

Hadoop Map Reduce for Mobile Clouds

Hadoop Map Reduce for Mobile Clouds

 

ABSTRACT:

The new generations of mobile devices have high processing power and storage, but they lag behind in terms ofsoftware systems for big data storage and processing. Hadoop is a scalable platform that provides distributed storage andcomputational capabilities on clusters of commodity hardware. Building Hadoop on a mobile network enables the devices torun data intensive computing applications without direct knowledge of underlying distributed systems complexities. However,these applications have severe energy and reliability constraints (e.g., caused by unexpected device failures or topology changesin a dynamic network). As mobile devices are more susceptible to unauthorized access, when compared to traditional servers,security is also a concern for sensitive data. Hence, it is paramount to consider reliability, energy efficiency and security for suchapplications. The MDFS (Mobile Distributed File System) [1] addresses these issues for big data processing in mobile clouds. Wehave developed the Hadoop MapReduce framework over MDFS and have studied its performance by varying input workloads ina real heterogeneous mobile cluster. Our evaluation shows that the implementation addresses all constraints in processing largeamounts of data in mobile clouds. Thus, our system is a viable solution to meet the growing demands of data processing in amobile environment.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Current mobile applications that perform massive computing tasks (big data processing) offload data and tasks to data centers or powerful servers in the cloud. There are several cloud services that offer computing infrastructure to end users for processing large datasets.
  • The previous research focused only on the parallel processing of tasks on mobile devices using the MapReduce framework without addressing the real challenges that occur when these devices are deployed in the mobile environment.
  • Huchton et al. proposed a k-Resilient Mobile Distributed File System (MDFS) for mobile devices targeted primarily for military operations.
  • Chen et al. proposed a new resource allocation scheme based on k-out-of-n framework and implemented a more reliable and energy efficient Mobile mDistributed File System for Mobile Ad Hoc Networks (MANETs) with significant improvements in energy consumption over the traditional MDFS architecture.

DISADVANTAGES OF EXISTING SYSTEM:

  • Fails in the absence of external network connectivity, as it is the case in military or disaster response operations.
  • This architecture is also avoided in emergencyresponse scenarios where there is limited connectivity tocloud, leading to expensive data upload and downloadoperations.
  • Traditional security mechanisms tailored for static networks are inadequate for dynamic networks.
  • Existing ignores energy efficiency. Mobile devices have limited battery power and can easily fail due to energy depletion
  • HDFS needs better reliability schemes for data in the mobile environment.

PROPOSED SYSTEM:

  • In this paper, we implement Hadoop MapReduce framework over MDFS and evaluate its performance on a general heterogeneous cluster of devices. We implement the generic file system interface of Hadoop for MDFS which makes our system interoperable with other Hadoop frameworks like HBase. There are no changes required for existing HDFS applications to be deployed over MDFS.
  • We propose the notion of blocks, which was missing in the traditional MDFS architecture. In our approach, the files are split into blocks based on the block size. These blocks are then split into fragments that are stored across the cluster. Each block is a normal Unix file with configurable block size. Block size has a direct impact on performance asit affects the read and write sizes.

ADVANTAGES OF PROPOSED SYSTEM:

  • To the best of our knowledge, this is the first work to bring Hadoop MapReduce framework for mobile cloud that truly addresses the challenges of the dynamic network environment.
  • Our system provides a distributed computing model for processing of large datasets in mobile environment while ensuring strong guarantees for energy efficiency, data reliability and security.

SYSTEM ARCHITECTURE:

Hadoop MapReduce for Mobile Clouds

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : i3 Processor
  • Hard Disk : 500 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram : 4GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7/UBUNTU.
  • Coding Language : Java 1.7 ,Hadoop 0.8.1
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Johnu George, Chien-An Chen, Radu Stoleru, Member, IEEE, Geoffrey G. Xie Member, IEEE, “Hadoop MapReduce for Mobile Clouds”, IEEE Transactions on Cloud Computing, 2017.