t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation

ABSTRACT:

Microaggregation is a technique for disclosure limitation aimed at protecting the privacy of data subjects in microdata releases. It has been used as an alternative to generalization and suppression to generate k-anonymous data sets, where the identity of each subject is hidden within a group of k subjects. Unlike generalization, microaggregation perturbs the data and this additional masking freedom allows improving data utility in several ways, such as increasing data granularity, reducing the impact of outliers and avoiding discretization of numerical data. k-Anonymity, on the other side, does not protect against attribute disclosure, which occurs if the variability of the confidential values in a group of k subjects is too small. To address this issue, several refinements of k-anonymity have been proposed, among which t-closeness stands out as providing one of the strictest privacy guarantees. Existing algorithms to generate t-close data sets are based on generalization and suppression (they are extensions of k-anonymization algorithms based on the same principles). This paper proposes and shows how to use microaggregation to generate k-anonymous t-close data sets. The advantages of microaggregation are analyzed, and then several microaggregation algorithms for k-anonymous t-closeness are presented and empirically evaluated.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Same as for k-anonymity, the most common way to attain t-closeness is to use generalization and suppression. In fact, the algorithms for k-anonymity based on those principles can be adapted to yield t-closeness by adding the t-closeness constraint in the search for a feasible minimal generalization: in the Incognito algorithm and in the Mondrian algorithm are respectively adapted to t-closeness.
  • SABRE is another interesting approach specifically designed for t-closeness. In SABRE the data set is first partitioned into a set of buckets and then the equivalence classes are generated by taking an appropriate number of records from each of the buckets.

DISADVANTAGES OF EXISTING SYSTEM:

  • The buckets in SABRE are generated in an iterative greedy manner which may yield more buckets than our algorithm (which analytically determines the minimal number of required buckets). A greater number of buckets leads to equivalence classes with more records and, thus, to more information loss.
  • It is not known how to optimally combine generalization and local suppression.
  • There is no agreement in the literature on how suppression should be performed: one can suppress at the record level (entire record suppressed), or suppress particular attributes in some records; furthermore, suppression can be done by either blanking a value or replacing it by a neutral value (i.e.some kind of average).
  • Last but not least, and no matter how suppression is performed, it complicates data analysis (users need to resort to software dealing with censored data).

PROPOSED SYSTEM:

  • A first contribution of this paper is to identify the strong points of microaggregation to achieve k-anonymous t-closeness. The second contribution consists of three new microaggregation-based algorithms for t-closeness, which are presented and evaluated.
  • We propose three different algorithms to reconcile conflicting goals.
  • The first algorithm is based on performing microaggregation in the usual way, and then merging clusters as much as needed to satisfy the t-closeness condition. This first algorithm is simple and it can be combined with any microaggregation algorithm, yet it may perform poorly regarding utility because clusters may end up being quite large.
  • The other algorithms modify the microaggregation algorithm for it to take t-closeness into account, in an attempt to improve the utility of the anonymized data set.
  • Two variants are proposed: k-anonymity-first (which generates each cluster based on the quasi-identifiers and then refines it to satisfy t-closeness) and t-closenessfirst (which generates each cluster based on both quasiidentifier attributes and confidential attributes, so that it satisfies t-closeness by design from the very beginning).

ADVANTAGES OF PROPOSED SYSTEM:

  • Microaggregation has several advantages over generalization/recoding for k-anonymity that are mostly related to data utility preservation:
  • Global recoding may recode some records that do not need it, hence causing extra information loss. On the other hand, local recoding makes data analysis more complex, as values corresponding to various different levels of generalization may co-exist in the anonymized data. Microaggregation is free from either drawback.
  • Data generalization usually results in a significant loss of granularity, because input values can only be replaced by a reduced set of generalizations, which are more constrained as one moves up in the hierarchy. Microaggregation, on the other hand, does not reduce the granularity of values, because they are replaced by numerical or categorical averages.
  • If outliers are present in the input data, the need to generalize them results in very coarse generalizations and, thus, in a high loss of information. For microaggregation, the influence of an outlier in the calculation of averages/centroids is restricted to the outlier’s equivalence class and hence is less noticeable.
  • For numerical attributes, generalization discretizes input numbers to numerical ranges and thereby changes the nature of data from continuous to discrete. In contrast, microaggregation maintains the continuous nature of numbers.
  • In microaggregation one seeks to maximize the homogeneity of records within a cluster, which is beneficial for the utility of the resultant k-anonymous data set.

SYSTEM ARCHITECTURE:

30

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Jordi Soria-Comas, Josep Domingo-Ferrer, Fellow, IEEE, David S´anchez and Sergio Mart´ınez, “t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015.

Real-Time Detection of Traffic From Twitter Stream Analysis

Real-Time Detection of Traffic From Twitter Stream Analysis

Real-Time Detection of Traffic From Twitter Stream Analysis

ABSTRACT:

Social networks have been recently employed as a source of information for event detection, with particular reference to road traffic congestion and car accidents. In this paper, we present a real-time monitoring system for traffic event detection from Twitter stream analysis. The system fetches tweets from Twitter according to several search criteria; processes tweets, by applying text mining techniques; and finally performs the classification of tweets. The aim is to assign the appropriate class label to each tweet, as related to a traffic event or not. The traffic detection system was employed for real-time monitoring of several areas of the Italian road network, allowing for detection of traffic events almost in real time, often before online traffic news web sites. We employed the support vector machine as a classification model, and we achieved an accuracy value of 95.75% by solving a binary classification problem (traffic versus non-traffic tweets). We were also able to discriminate if traffic is caused by an external event or not, by solving a multiclass classification problem and obtaining an accuracy value of 88.89%.

EXISTING SYSTEM:

  • Recently, social networks and media platforms have been widely used as a source of information for the detection of events, such as traffic congestion, incidents, natural disasters (earthquakes, storms, fires, etc.), or other events.
  • Sakaki et al. use Twitter streams to detect earthquakes and typhoons, by monitoring special trigger-keywords, and by applying an SVM as a binary classifier of positive events (earthquakes and typhoons) and negative events (non-events or other events).
  • Agarwal et al. focus on the detection of fires in a factory from Twitter stream analysis, by using standard NLP techniques and a Naive Bayes (NB) classifier.
  • Li et al. propose a system, called TEDAS, to retrieve incident-related tweets. The system focuses on Crime and Disaster-related Events (CDE) such as shootings, thunderstorms, and car accidents, and aims to classify tweets as CDE events by exploiting a filtering based on keywords, spatial and temporal information, number of followers of the user, number of retweets, hashtags, links, and mentions.

DISADVANTAGES OF EXISTING SYSTEM:

  • Event detection from social networks analysis is a more challenging problem than event detection from traditional media like blogs, emails, etc., where texts are well formatted.
  • SUMs are unstructured and irregular texts, they contain informal or abbreviated words, misspellings or grammatical errors.
  • SUMs contain a huge amount of not useful or meaningless information

PROPOSED SYSTEM:

  • In this paper, we propose an intelligent system, based on text mining and machine learning algorithms, for real-time detection of traffic events from Twitter stream analysis.
  • The system, after a feasibility study, has been designed and developed from the ground as an event-driven infrastructure, built on a Service Oriented Architecture (SOA).
  • The system exploits available technologies based on state-of-the-art techniques for text analysis and pattern classification. These technologies and techniques have been analyzed, tuned, adapted, and integrated in order to build the intelligent system.
  • In particular, we present an experimental study, which has been performed for determining the most effective among different state-of-the-art approaches for text classification. The chosen approach was integrated into the final system and used for the on-the-field real-time detection of traffic events.
  • In this paper, we focus on a particular small-scale event, i.e., road traffic, and we aim to detect and analyze traffic events by processing users’ SUMs belonging to a certain area and written in the Italian language. To this aim, we propose a system able to fetch, elaborate, and classify SUMs as related to a road traffic event or not.
  • To the best of our knowledge, few papers have been proposed for traffic detection using Twitter stream analysis. However, with respect to our work, all of them focus on languages different from Italian, employ different input features and/or feature selection algorithms, and consider only binary classifications.

ADVANTAGES OF PROPOSED SYSTEM:

  • Tweets are up to 140 characters, enhancing the real-time and news-oriented nature of the platform. In fact, the life-time of tweets is usually very short, thus Twitter is the social network platform that is best suited to study SUMs related to real-time events.
  • Each tweet can be directly associated with meta-information that constitutes additional information.
  • Twitter messages are public, i.e., they are directly available with no privacy limitations. For all of these reasons, Twitter is a good source of information for real-time event detection and analysis.
  • Moreover, the proposed system could work together with other traffic sensors (e.g., loop detectors, cameras, infrared cameras) and ITS monitoring systems for the detection of traffic difficulties, providing a low-cost wide coverage of the road network, especially in those areas (e.g., urban and suburban) where traditional traffic sensors are missing.
  • It performs a multi-class classification, which recognizes non-traffic, traffic due to congestion or crash, and traffic due to external events
  • It detects the traffic events in real-time; and iii) it is developed as an event-driven infrastructure, built on an SOA architecture.

SYSTEM ARCHITECTURE:

29

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Eleonora D’Andrea, Pietro Ducange, Beatrice Lazzerini, Member, IEEE, and Francesco Marcelloni, Member, IEEE, “Real-Time Detection of Traffic From Twitter Stream Analysis”, IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 16, NO. 4, AUGUST 2015.

Prediction of Atomic Web Services Reliability for QoS-Aware Recommendation

Prediction of Atomic Web Services Reliability for QoS-Aware Recommendation

Prediction of Atomic Web Services Reliability for QoS-Aware Recommendation

ABSTRACT:

While constructing QoS-aware composite work-flows based on service oriented systems, it is necessary to assess nonfunctional properties of potential service selection candidates. In this paper, we present CLUS, a model for reliability prediction of atomic web services that estimates the reliability for an ongoing service invocation based on the data assembled from previous invocations. With the aim to improve the accuracy of the current state-of-the-art prediction models, we incorporate user-service-, and environment-specific parameters of the invocation context. To reduce the scalability issues present in the state-of-the-art approaches, we aggregate the past invocation data using K-means clustering algorithm. In order to evaluate different quality aspects of our model, we conducted experiments on services deployed in different regions of the Amazon cloud. The evaluation results confirm that our model produces more scalable and accurate predictions when compared to the current state-of-the-art approaches.

EXISTING SYSTEM:

  • The researchers have proposed several prediction models based on collaborative filtering technique often used in modern recommendation systems.
  • A myriad of different approaches for modeling the reliability of traditional software systems have been proposed in the literature.
  • Various approaches for predicting the reliability of composite services have been proposed. All these approaches usually assume the atomic service reliability values are already known or rarely suggest how can they be acquired.
  • The most successful approaches for prediction of atomic service reliability are based on the collaborative filtering technique. According to the related literature, the basic types of collaborative filtering are: memory-based, model-based and hybrid.
  • The memory-based collaborative filtering is a commonly used technique in nowadays state-of-the-art recommendation systems. This filtering technique extracts the information or patterns by statistically correlating the data obtained from multiple entities like agents, viewpoints or data sources.
  • The model-based collaborative filtering approaches are known to be more computationally complex and difficult to implement. These approaches often combine more complex techniques such as machine learning or data mining algorithms to learn the prediction model by recognizing complex patterns using the training data, and then use the model to make predictions on the real data.
  • The hybrid collaborative filtering approaches can be very effective in addressing disadvantages of basic memory based collaborative filtering

DISADVANTAGES OF EXISTING SYSTEM:

  • Acquiring a comprehensive past invocation sample proves to be a very challenging task for several reasons.
  • Even though the existing collaborative filtering based approaches achieve promising performance, they demonstrate disadvantages primarily related to the prediction accuracy in dynamic environments and scalability issues caused by the invocation sample size.
  • Collecting a comprehensive sample of reliability values is a very difficult task in practice.
  • Their prediction capability often relies on additional domain specific data describing the internals of a system. It proves to be a challenging task to obtain such data in practice.
  • The existing approaches implicitly consider only user and service-specific parameters of the prediction

PROPOSED SYSTEM:

  • This paper is focused on atomic service reliability, as one of the most important nonfunctional properties. We define service reliability as the probability that a service invocation gets completed successfully—i.e. correct response to the service invocation gets successfully retrieved under the specified conditions and time constraints.
  • A model-based collaborative filtering approach CLUS (CLUStering) is introduced. The model considers user-, service- and environment-specific parameters to provide a more accurate description of the service invocation context. The environment-specific parameters, not present in the related approaches, are used to model the effect of varying load conditions on service reliability. Such an approach results in more accurate reliability predictions. Furthermore, the model addresses scalability issues by aggregating users and services into respective user and service clusters according to their reliability values using K-means clustering.
  • A novel strategy for assembly of most recent service usage feedback is presented. The strategy enables discovery of deviations from the presumed load distributions and is applied to increase CLUS accuracy.
  • A novel model-based collaborative filtering approach that utilizes linear regression, an unsupervised machine learning technique, is presented.

ADVANTAGES OF PROPOSED SYSTEM:

  • Improved prediction accuracy.
  • The evaluation results confirm that CLUS model provides best prediction performance among the competing approaches considering both prediction accuracy and computational performance.
  • In proposed system, we incorporate the environment-specific parameters into the prediction which in turn significantly reduces the RMSE value. This can especially be observed for CLUS approach at higher data densities (e.g. it achieves 17 percent lower RMSE then LUCS). At lower densities LUCS outperforms CLUS and LinReg, but LinReg achieves 21 percent lower RMSE then CLUS.
  • Additional advantages of our approaches can be found in their flexibility which is manifested in a compromise between accuracy and scalability. Increasing the number of clusters in CLUS and complexity of the hypothesis function in LinReg yields more accurate predictions at the cost of computational performance.

SYSTEM ARCHITECTURE:

28

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Marin Silic, Student Member, IEEE, Goran Delac, Student Member, IEEE, and Sinisa Srbljic, Senior Member, IEEE, “Prediction of Atomic Web Services Reliability for QoS-Aware Recommendation”, IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2015.

A Fuzzy Preference Tree-Based Recommender System for Personalized Business-to-Business E-Services

A Fuzzy Preference Tree-Based Recommender System for Personalized Business-to-Business E-Services

A Fuzzy Preference Tree-Based Recommender System for Personalized Business-to-Business E-Services

ABSTRACT:

The Web creates excellent opportunities for businesses to provide personalized online services to their customers. Recommender systems aim to automatically generate personalized suggestions of products/services to customers (businesses or individuals). Although recommender systems have been well studied, there are still two challenges in the development of a recommender system, particularly in real-world B2B e-services: 1) items or user profiles often present complicated tree structures in business applications, which cannot be handled by normal item similarity measures and 2) online users’ preferences are often vague and fuzzy, and cannot be dealt with by existing recommendation methods. To handle both these challenges, this study first proposes a method for modeling fuzzy tree-structured user preferences, in which fuzzy set techniques are used to express user preferences. A recommendation approach to recommending tree-structured items is then developed. The key technique in this study is a comprehensive tree matching method, which can match two tree-structured data and identify their corresponding parts by considering all the information on tree structures, node attributes, and weights. Importantly, the proposed fuzzy preference tree-based recommendation approach is tested and validated using an Australian business dataset and the MovieLens dataset. Experimental results show that the proposed fuzzy tree-structured user preference profile reflects user preferences effectively and the recommendation approach demonstrates excellent performance for tree-structured items, especially in e-business applications. This study also applies the proposed recommendation approach to the development of a web-based business partner recommender system.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • The three main recommendation techniques are collaborative filtering (CF), content-based (CB) and knowledge-based (KB) techniques.
  • The CF technique is currently the most successful and widely used technique for recommender systems.
  • CB recommendation techniques recommend items that are similar to those previously preferred by a specific user.
  • The KB recommender systems offer items to users based on knowledge about the users and items

DISADVANTAGES OF EXISTING SYSTEM:

  • The fuzzy preferences models mentioned previously, which are represented as vectors, are not suitable to dealing with the tree-structured data in a Web-based B2B environment.
  • Excessive amounts of information on the Web create a severe information overload problem
  • When the number of rated items for the CS user is small, the CF-based approach cannot accurately find user neighbors using rating similarity; therefore, it fails to generate accurate recommendations.
  • The major limitations of CB approaches are the item content dependence problem, overspecialization problem, and new user problem
  • The KB approach has some limitations, however, for instance, the KB approach needs to retain some information about items and users, as well as functional knowledge, to make recommendations. It also suffers from the scalability problem because it requires more time and effort to calculate the similarities in a large case base than other recommendation techniques.

PROPOSED SYSTEM:

  • This study proposes a method for modeling fuzzy tree-structured user preferences, presents a tree matching method, and, based on the previous methods, develops an innovative fuzzy preference tree-based recommendation approach. The developed new approach has been implemented and applied in a business partner recommender system.
  • This paper has three main contributions. From the theoretical aspect, a tree matching method, which comprehensively considers tree structures, node attributes, and weights, is developed.
  • From the technical aspect, a fuzzy tree-structured user preference modeling method is developed, as well as a fuzzy preference tree-based recommendation approach for tree-structured items. From the practical aspect, the proposed methods/approaches are used to develop a Web-based B2B recommender system software known as Smart BizSeeker, with effective results.

ADVANTAGES OF PROPOSED SYSTEM:

  • The evaluation results on the Australian business dataset. It can be seen that the proposed recommendation approach in this study has the lowest MAE, the highest precision, high recall, and highest F1 measure.
  • The results indicate that the fuzzy tree-structured user preference profile effectively reflects business users’ preferences, and the proposed approach is well-suited to the business application environment.

SYSTEM ARCHITECTURE:

 27

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Dianshuang Wu, Guangquan Zhang, and Jie Lu, Senior Member, IEEE, “A Fuzzy Preference Tree-Based Recommender System for Personalized Business-to-Business E-Services”, IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 23, NO. 1, FEBRUARY 2015.

Location-Aware and Personalized Collaborative Filtering for Web Service Recommendation

Location-Aware and Personalized Collaborative Filtering for Web Service Recommendation

Location-Aware and Personalized Collaborative Filtering for Web Service Recommendation

ABSTRACT:

Collaborative Filtering (CF) is widely employed for making Web service recommendation. CF-based Web service recommendation aims to predict missing QoS (Quality-of-Service) values of Web services. Although several CF-based Web service QoS prediction methods have been proposed in recent years, the performance still needs significant improvement. Firstly, existing QoS prediction methods seldom consider personalized influence of users and services when measuring the similarity between users and between services. Secondly, Web service QoS factors, such as response time and throughput, usually depends on the locations of Web services and users. However, existing Web service QoS prediction methods seldom took this observation into consideration. In this paper, we propose a location-aware personalized CF method for Web service recommendation. The proposed method leverages both locations of users and Web services when selecting similar neighbors for the target user or service. The method also includes an enhanced similarity measurement for users and Web services, by taking into account the personalized influence of them. To evaluate the performance of our proposed method, we conduct a set of comprehensive experiments using a real-world Web service dataset. The experimental results indicate that our approach improves the QoS prediction accuracy and computational efficiency significantly, compared to previous CF-based methods.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • QoS is usually defined as a set of non-functional properties, such as response time, throughput, reliability, and so on. Due to the paramount importance of QoS in building successful service-oriented applications, QoS-based Web service discovery and selection has garnered much attention from both academia and industry.
  • Typically, a user prefers to select a Web service with the best QoS performance, provided that a set of Web service candidates satisfying his/her functional requirements are discovered. In reality, however, it is neither easy nor practical for a user to acquire the QoS for all Web service candidates, due to the following reasons: Web service QoS is highly depend on both users’ and Web services’ circumstances.
  • Therefore, the observed QoS of the same Web service may be different from user to user. Con-ducting real-world Web service evaluation for obtaining QoS of Web service candidates is both time-consuming and resource-consuming.

DISADVANTAGES OF EXISTING SYSTEM:

  • It is impractical for a user to acquire QoS information by invoking all of the service candidates. And some QoS properties (e.g., reputation and reliability) are difficult to be evaluated, since they require both long observation duration and a large number of invocations. These challenges call for more effective approaches to acquire service QoS information.
  • Previous CF-based Web service recommendation methods have rarely taken into account the peculiar characteristics of Web service QoS when making QoS predictions.
  • QoS attributes of Web services such as response time and throughput highly depend on the underlying network conditions, which, however, are usually ignored by the previous work.

PROPOSED SYSTEM:

  • We proposed an enhanced measurement for compu-ting QoS similarity between different users and between different services. The measurement takes into account the personalized deviation of Web services’ QoS and users’ QoS experiences, in order to improve the accuracy of similarity computation. Based on the above enhanced similarity measurement, we proposed a location-aware CF-based Web service QoS prediction method for service recommendation.
  • We conducted a set of comprehensive experiments employing a real-world Web service dataset, which demonstrated that the proposed Web service QoS prediction method significantly outperforms previous well-known methods. We also incorporate the locations of both Web services and users into similar neighbor selection, for both Web services and users. Comprehensive experiments conducted on a real Web service dataset indicate that our method significantly outperforms previous CF-based Web service recommendation.

ADVANTAGES OF PROPOSED SYSTEM:

  • Our location-aware QoS prediction method has a solid basis, because of the strong relation between the locations of users (or Web services) and the Web services’ QoS perceived by the users.
  • We conducted an experiment to evaluate the impact of data sparseness on the prediction coverage, in which, our proposed methods (including ULACF, ILACF and HLACF) were compared with the traditional CF methods such as UPCC and IPCC. We find that, our methods can always achieve nearly 100% prediction coverage, when the matrix density varies from 5% to 30%. By contrast, the traditional CF methods have significantly lower prediction coverage, especially when K is small.
  • Achieves aiming at improving the QoS prediction performance, we take into account the personal QoS characteristics of both Web services and users to compute similarity between them.

SYSTEM ARCHITECTURE:

26

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Jianxun Liu, Mingdong Tang, Member, IEEE, Zibin Zheng, Member, IEEE, Xiaoqing (Frank) Liu, Member, IEEE, Saixia Lyu, “Location-Aware and Personalized Collaborative Filtering for Web Service Recommendation”, IEEE TRANSACTIONS ON SERVICES COMPUTING, 2015.

Learning to Rank Image Tags With Limited Training Examples

Learning to Rank Image Tags With Limited Training Examples

Learning to Rank Image Tags With Limited Training Examples

ABSTRACT:

With an increasing number of images that are available in social media, image annotation has emerged as an important research topic due to its application in image matching and retrieval. Most studies cast image annotation into a multilabel classification problem. The main shortcoming of this approach is that it requires a large number of training images with clean and complete annotations in order to learn a reliable model for tag prediction. We address this limitation by developing a novel approach that combines the strength of tag ranking with the power of matrix recovery. Instead of having to make a binary decision for each tag, our approach ranks tags in the descending order of their relevance to the given image, significantly simplifying the problem. In addition, the proposed method aggregates the prediction models for different tags into a matrix, and casts tag ranking into a matrix recovery problem. It introduces the matrix trace norm to explicitly control the model complexity, so that a reliable prediction model can be learned for tag ranking even when the tag space is large and the number of training images is limited. Experiments on multiple well-known image data sets demonstrate the effectiveness of the proposed framework for tag ranking compared with the state-of-the-art approaches for image annotation and tag ranking.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Most automatic image annotation algorithms can be classified into three categories (i) generative models that model the joint distribution between tags and visual features, (ii) discriminative models that view image annotation as a classification problem, and (iii) search based approaches.
  • In one of the existing system, a Gaussian mixture model is used to model the dependence between keywords and visual features.
  • In another system, kernel density estimation is applied to model the distribution of visual features and to estimate the conditional probability of keyword assignments given the visual features. Topic models annotate images as samples from a specific mixture of topics, which each topic is a joint distribution between image features and annotation keywords.

DISADVANTAGES OF EXISTING SYSTEM:

  • Although multiple algorithms have been developed for tag ranking, they tend to perform poorly when the number of training images is limited compared to the number of tags, a scenario often encountered in real world applications.
  • Another limitation of these approaches is that they are unable to capture the correlation among classes, which is known to be important in multi-label learning.
  • Most of the existing algorithms for tag ranking tend to perform poorly when the tag space is large and the number of training images is limited.

PROPOSED SYSTEM:

  • In this work, we have proposed a novel tag ranking scheme for automatic image annotation.
  • We first present the proposed framework for tag ranking that is explicitly designed for a large tag space with a limited number of training images.
  • The proposed scheme casts the tag ranking problem into a matrix recovery problem and introduces trace norm regularization to control the model complexity. Extensive experiments on image annotation and tag ranking have demonstrated that the proposed method significantly outperforms several state-of-the-art methods for image annotation especially when the number of training images is limited and when many of the assigned image tags are missing.

ADVANTAGES OF PROPOSED SYSTEM:

  • The proposed scheme casts the tag ranking problem into a matrix recovery problem and introduces trace norm regularization to control the model complexity.
  • Extensive experiments on image annotation and tag ranking have demonstrated that the proposed method significantly outperforms several state-of-the-art methods for image annotation especially when the number of training images is limited and when many of the assigned image tags are missing.

SYSTEM ARCHITECTURE:

 25

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Songhe Feng, Zheyun Feng, and Rong Jin, “Learning to Rank Image Tags With Limited Training Examples”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 4, APRIL 2015.

Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System

Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System

Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System

ABSTRACT:

Many web computing systems are running real time database services where their information change continuously and expand incrementally. In this context, web data services have a major role and draw significant improvements in monitoring and controlling the information truthfulness and data propagation. Currently, web telemedicine database services are of central importance to distributed systems. However, the increasing complexity and the rapid growth of the real world healthcare challenging applications make it hard to induce the database administrative staff. In this paper, we build an integrated web data services that satisfy fast response time for large scale Tele-health database management systems. Our focus will be on database management with application scenarios in dynamic telemedicine systems to increase care admissions and decrease care difficulties such as distance, travel, and time limitations. We propose three-fold approach based on data fragmentation, database websites clustering and intelligent data distribution. This approach reduces the amount of data migrated between websites during applications’ execution; achieves cost effective communications during applications’ processing and improves applications’ response time and throughput. The proposed approach is validated internally by measuring the impact of using our computing services’ techniques on various performance features like communications cost, response time, and throughput. The external validation is achieved by comparing the performance of our approach to that of other techniques in the literature. The results show that our integrated approach significantly improves the performance of web database systems and outperforms its counterparts.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Recently, many researchers have focused on designing web medical database management systems that satisfy certain performance levels. Such performance is evaluated by measuring the amount of relevant and irrelevant data accessed and the amount of transferred medical data during transactions’ processing time.
  • Several techniques have been proposed in order to improve telemedicine database performance, optimize medical data distribution, and control medical data proliferation. These techniques believed that high performance for such systems can be achieved by improving at least one of the database web management services, namely—database fragmentation, data distribution, websites clustering, distributed caching, and database scalability.

DISADVANTAGES OF EXISTING SYSTEM:

  • Some of these data records may be overlapped or even redundant, which increase the I/O transactions’ processing time and so the system communications overhead.
  • These works have mostly investigated fragmentation, allocation and sometimes clustering problems.
  • The transactions should be executed very fast in a flexible load balancing database environment. When the number of sites in a web database system increases to a large scale,
  • The intractable time complexity of processing large number of medical transactions and managing huge number of communications make the design of such methods a non-trivial task.

PROPOSED SYSTEM:

  • Our approach integrates three enhanced computing services’ techniques namely, database fragmentation, network sites clustering and fragments allocation
  • We propose an estimation model to compute communications cost which helps in finding cost-effective data allocation solutions. We perform both external and internal evaluation of our integrated approach.
  • In our proposed system we Develop a fragmentation computing service technique by splitting telemedicine database relations into small disjoint fragments. This technique generates the minimum number of disjoint fragments that would be allocated to the web servers in the data distribution phase. This in turn reduces the data transferred and accessed through different websites and accordingly reduces the communications cost.
  • In the proposed system we introduce a high speed clustering service technique that groups the web telemedicine database sites into sets of clusters according to their communications cost. This helps in grouping the websites that are more suitable to be in one cluster to minimize data allocation operations, which in turn helps to avoid allocating redundant data.
  • We propose a new computing service technique for telemedicine data allocation and redistribution services based on transactions’ processing cost functions.
  • Develop a user-friendly experimental tool to perform services of telemedicine data fragmentation, websites clustering, and fragments allocation, as well as assist database administrators in measuring WTDS performance.
  • Integrate telemedicine database fragmentation, websites clustering, and data fragments allocation into one scenario to accomplish ultimate web telemedicine system throughput in terms of concurrency, reliability, and data availability.

ADVANTAGES OF PROPOSED SYSTEM:

  • Our integrated approach significantly improves services requirement satisfaction in web systems. This conclusion requires more investigation and experiments.
  • This technique generates the minimum number of disjoint fragments that would be allocated to the web servers in the data distribution phase.
  • Introduce a high speed clustering service technique that groups the web telemedicine database sites into sets of clusters according to their communications cost.

SYSTEM ARCHITECTURE:

24

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Ismail Hababeh, Issa Khalil, and Abdallah Khreishah, “Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System”, IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015.

EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval

EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval

EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval

ABSTRACT:

Graph-based ranking models have been widely applied in information retrieval area. In this paper, we focus on a well known graph-based model – the Ranking on Data Manifold model, or Manifold Ranking (MR). Particularly, it has been successfully applied to content-based image retrieval, because of its outstanding ability to discover underlying geometrical structure of the given image database. However, manifold ranking is computationally very expensive, which significantly limits its applicability to large databases especially for the cases that the queries are out of the database (new samples). We propose a novel scalable graph-based ranking model called Efficient Manifold Ranking (EMR), trying to address the shortcomings of MR from two main perspectives: scalable graph construction and efficient ranking computation. Specifically, we build an anchor graph on the database instead of a traditional k-nearest neighbor graph, and design a new form of adjacency matrix utilized to speed up the ranking. An approximate method is adopted for efficient out-of-sample retrieval. Experimental results on some large scale image databases demonstrate that EMR is a promising method for real world retrieval applications.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Most traditional methods focus on the data features too much but they ignore the underlying structure information, which is of great importance for semantic discovery, especially when the label information is unknown.
  • Many databases have underlying cluster or manifold structure. Under such circumstances, the assumption of label consistency is reasonable. It means that those nearby data points, or points belong to the same cluster or manifold, are very likely to share the same semantic label. This phenomenon is extremely important to explore the semantic relevance when the label information is unknown. In our opinion, a good CBIR system should consider images’ low-level features as well as the intrinsic structure of the image database.

DISADVANTAGES OF EXISTING SYSTEM:

  • It has expensive computational cost, both in graph construction and ranking computation stages.
  • Particularly, it is unknown how to handle an out-of-sample query efficiently under the existing framework.
  • It is unacceptable to recompute the model for a new query. That means, original manifold ranking is inadequate for a real world CBIR system, in which the user provided query is always an out-of-sample.

PROPOSED SYSTEM:

  • In this paper, we extend the original manifold ranking and propose a novel framework named Efficient Manifold Ranking (EMR).
  • We try to address the shortcomings of manifold ranking from two perspectives: the first is scalable graph construction; and the second is efficient computation, especially for out-of-sample retrieval.
  • Specifically, we build an anchor graph on the database instead of the traditional k-nearest neighbor graph, and design a new form of adjacency matrix utilized to speed up the ranking computation.
  • The model has two separate stages: an offline stage for building (or learning) the ranking model and an online stage for handling a new query.
  • With EMR, we can handle a database with many images and do the online retrieval in a short time. To the best of our knowledge, no previous manifold ranking based algorithm has run out-of-sample retrieval on a database in this scale.

ADVANTAGES OF PROPOSED SYSTEM:

  • We show several experimental results and comparisons to evaluate the effectiveness and efficiency of our proposed method EMR on many real time images.
  • We can run out-of sample retrieval on a large scale database in a short time.
  • Our model EMR can efficiently handle the new sample as a query for retrieval. In this subsection, we describe the light-weight computation of EMR for a new sample query. We want to emphasize that this is a big improvement over our previous conference version of this work, which makes EMR scalable for large-scale image databases.

SYSTEM ARCHITECTURE:

23

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Bin Xu, Jiajun Bu, Chun Chen, Can Wang, Deng Cai, and Xiaofei He, “EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015.

An Attribute-Assisted Reranking Model for Web Image Search

An Attribute-Assisted Reranking Model for Web Image Search

An Attribute-Assisted Reranking Model for Web Image Search

ABSTRACT:

Image search reranking is an effective approach to refine the text-based image search result. Most existing reranking approaches are based on low-level visual features. In this paper, we propose to exploit semantic attributes for image search reranking. Based on the classifiers for all the predefined attributes, each image is represented by an attribute feature consisting of the responses from these classifiers. A hypergraph is then used to model the relationship between images by integrating low-level visual features and attribute features. Hypergraph ranking is then performed to order the images. Its basic principle is that visually similar images should have similar ranking scores. In this paper, we propose a visual-attribute joint hypergraph learning approach to simultaneously explore two information sources. A hypergraph is constructed to model the relationship of all images.We conduct experiments on more than 1,000 queries in MSRA-MMV2.0 data set. The experimental results demonstrate the effectiveness of our approach.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Many image search engines such as Google and Bing have relied on matching textual information of the images against queries given by users. However, text-based image retrieval suffers from essential difficulties that are caused mainly by the incapability of the associated text to appropriately describe the image content.
  • The existing visual reranking methods can be typically categorized into three categories as the clustering based, classification based and graph based methods.
  • The clustering based reranking methods stem from the key observation that a wealth of visual characteristics can be shared by relevant images.
  • In the classification based methods, visual reranking is formulated as binary classification problem aiming to identify whether each search result is relevant or not.
  • Graph based methods have been proposed recently and received increasing attention as demonstrated to be effective. The multimedia entities in top ranks and their visual relationship can be represented as a collection of nodes and edges

DISADVANTAGES OF EXISTING SYSTEM:

  • In the classification based methods, visual reranking is formulated as binary classification problem aiming to identify whether each search result is relevant or not.
  • The framework casts the reranking problem as random walk on an affinity graph and reorders images according to the visual similarities.

PROPOSED SYSTEM:

  • We propose a new attribute-assisted reranking method based on hypergraph learning. We first train several classifiers for all the pre-defined attributes and each image is represented by attribute feature consisting of the responses from these classifiers.
  • We improve the hypergraph learning method approach by adding a regularizer on the hyperedge weights which performs an implicit selection on the semantic attributes.
  • This paper serves as a first attempt to include the attributes in reranking framework. We observe that semantic attributes are expected to narrow down the semantic gap between low-level visual features and high level semantic meanings.

ADVANTAGES OF PROPOSED SYSTEM:

  • We propose a novel attribute-assisted retrieval model for reranking images. Based on the classifiers for all the predefined attributes.
  • We perform hypergraph ranking to re-order the images, which is also constructed to model the relationship of all images.
  • Our proposed iterative regularization framework could further explore the semantic similarity between images by aggregating their local
  • Compared with the previous method, a hypergraph is reconstructed to model the relationship of all the images, in which each vertex denotes an image and a hyperedge represents an attribute and a hyperedge connects to multiple vertices.

SYSTEM ARCHITECTURE:

 22

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Junjie Cai, Zheng-Jun Zha, Member, IEEE, Meng Wang, Shiliang Zhang, and Qi Tian, Senior Member, IEEE, “An Attribute-Assisted Reranking Model for Web Image Search”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 1, JANUARY 2015

Software Puzzle: A Countermeasure to Resource-Inflated Denial-of-Service Attacks

Software Puzzle: A Countermeasure to Resource-Inflated Denial-of-Service Attacks

Software Puzzle: A Countermeasure to Resource-Inflated Denial-of-Service Attacks

ABSTRACT:

Denial-of-service (DoS) and distributed DoS (DDoS) are among the major threats to cyber-security, and client puzzle, which demands a client to perform computationally expensive operations before being granted services from a server, is a well-known countermeasure to them. However, an attacker can inflate its capability of DoS/DDoS attacks with fast puzzlesolving software and/or built-in graphics processing unit (GPU) hardware to significantly weaken the effectiveness of client puzzles. In this paper, we study how to prevent DoS/DDoS attackers from inflating their puzzle-solving capabilities. To this end, we introduce a new client puzzle referred to as software puzzle. Unlike the existing client puzzle schemes, which publish their puzzle algorithms in advance, a puzzle algorithm in the present software puzzle scheme is randomly generated only after a client request is received at the server side and the algorithm is generated such that: 1) an attacker is unable to prepare an implementation to solve the puzzle in advance and 2) the attacker needs considerable effort in translating a central processing unit puzzle software to its functionally equivalent GPU version such that the translation cannot be done in real time. Moreover, we show how to implement software puzzle in the generic server-browser model.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • DoS and DDoS are effective if attackers spend much less resources than the victim server or are much more powerful than normal users. In the example above, the attacker spends negligible effort in producing a request, but the server has to spend much more computational effort in HTTPS handshake (e.g., for RSA decryption). In this case, conventional crypto-graphic tools do not enhance the availability of the services; in fact, they may degrade service quality due to expensive cryptographic operations.
  • The seriousness of the DoS/DDoS problem and their increased frequency has led to the advent of numerous defense mechanisms.
  • As the present browsers such as Microsoft Internet Explorer and Firefox do not explicitly support client puzzle schemes, Kaiser and Feng developed a web-based client puzzle scheme which focuses on transparency and backwards compatibility for incremental deployment. The scheme dynamically embeds client-specific challenges in webpages, transparently delivers server challenges and client responses.

DISADVANTAGES OF EXISTING SYSTEM:

  • Puzzle is designed based on client’s GPU capability, the GPU-inflation DoS does not work at all. However, we do not recommend to do so because it is troublesome for massive deployment due to (1) not all the clients have GPU-enabled devices; and (2) an extra real-time environment shall be installed in order to run GPU kernel.
  • However, this scheme is vulnerable to DoS attackers who can implement the puzzle function in real-time.
  • Existing systems are not dynamic.

PROPOSED SYSTEM:

  • In this paper, software puzzle scheme is proposed for defeating GPU-inflated DoS attack. It adopts software protection technologies to ensure challenge data confidentiality and code security for an appropriate time period, e.g., 1-2 seconds. Hence, it has different security requirement from the conventional cipher which demands long-term confidentiality only, and code protection which focuses on long-term robustness against reverse-engineering only.
  • Since the software puzzle may be built upon a data puzzle, it can be integrated with any existing server-side data puzzle scheme, and easily deployed as the present client puzzle schemes do. Although this paper focuses on GPU-inflation attack, its idea can be extended to thwart DoS attackers which exploit other inflation resources such as Cloud Computing.
  • By exploiting the architectural difference between CPU and GPU, this paper presents a new type of client puzzle, called software puzzle, to defend against GPU-inflated DoS and DDoS attacks.

ADVANTAGES OF PROPOSED SYSTEM:

  • SSL/TLS protocol is the most popular on-line transaction protocol, and an SSL/TLS server performs an expensive RSA decryption operation for each client connection request, thus it is vulnerable to DoS attack.
  • Our objective is to protect SSL/TLS server with software puzzle against computational DoS attacks, particularly GPU-inflated DoS attack. As a complete SSL/TLS protocol includes many rounds, we use RSA decryption step to evaluate the defense effectiveness in terms of the server’s time cost for simplicity.
  • The software puzzle scheme dynamically generates the puzzle function

21

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Yongdong Wu, Zhigang Zhao, Feng Bao, and Robert H. Deng, “Software Puzzle: A Countermeasure to Resource-Inflated Denial-of-Service Attacks”, IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 1, JANUARY 2015.