Visual Analysis of Multiple Route Choices based on General GPS Trajectories

Visual Analysis of Multiple Route Choices based on General GPS Trajectories

ABSTRACT:

There are often multiple routes between regions. Drivers choose different routes with different considerations. Such considerations, have always been a point of interest in the transportation area. Studies of route choice behaviour are usually based on small range experiments with a group of volunteers. However, the experiment data is quite limited in its spatial and temporal scale as well as the practical reliability. In this work, we explore the possibility of studying route choice behaviour based on general trajectory data-set, which is more realistic in a wider scale. We develop a visual analytic system to help users handle the large-scale trajectory data, compare different route choices, and explore the underlying reasons. Specifically, the system consists of: 1. the interactive trajectory filtering which supports graphical trajectory query; 2. the spatial view which gives an overview of all feasible routes extracted from filtered trajectories; 3. the factor visualizations which provide the exploration and hypothesis construction of different factors’ impact on route choice behaviour, and the verification with an integrated route choice model. Applying to real taxi GPS dataset, we report the system’s performance and demonstrate its effectiveness with three cases.

EXISTING SYSTEM:

Studies of route choice behavior are usually based on small range experiments with a group of volunteers. However, the experiment data is quite limited in its spatial and temporal scale as well as the practical reliability.

DISADVANTAGES OF EXISTING SYSTEM:

  • Route choice behavior are usually based on small range experiments
  • Data is quite limited in its spatial

PROPOSED SYSTEM:

  • we propose a visual analytics system which leverages human interaction and judgment in the trajectory data mining process to tackle the above challenges: with a suite of graphical filters, trajectories between regions of interest are queried interactively; based on filtered trajectories, feasible routes are constructed automatically; with a list of factors derived from general GPS trajectory data, route choice distributions over those factors are visualized, which supports to explore and raise hypotheses on potential influence; then the hypotheses are further verified by the statistical model to draw reliable conclusions
  • The Visual analytics is proposed as the science of analytical reasoning facilitated by interactive visual interfaces. By integrating computational and theory-based tools with innovative interactive techniques and visual representations, visual analytics enables human to participate in problem solving. In this work, from the perspective of visual analytics, we propose a visual analytics system which leverages human interaction and judgment in the trajectory data mining process to tackle the above challenges: with a suite of graphical filters, trajectories between regions of interest are queried interactively; based on filtered trajectories, feasible routes are constructed automatically; with a list of factors derived from general GPS trajectory data, route choice distributions over those factors are visualized, which supports to explore and raise hypotheses on potential influence; then the hypotheses are further verified by the statistical model to draw reliable conclusions

ADVANTAGES OF PROPOSED SYSTEM:

  • We explore the possibility of analyzing multiple route choice behavior based on general GPS data.
  • We develop a visual analytic system to explore the route choice behavior with real GPS data.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram :         1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Min Lu, Chufan Lai, Tangzhi Ye, Jie Liang, Member, IEEE, and Xiaoru Yuan, Senior Member, IEEE, “Visual Analysis of Multiple Route Choices based on General GPS Trajectories”, IEEE TRANSACTIONS ON BIG DATA, 2017

Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs

Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs

ABSTRACT:

We propose NOPOL, an approach to automatic repair of buggy conditional statements (i.e., if-then-else statements). This approach takes a buggy program as well as a test suite as input and generates a patch with a conditional expression as output. The test suite is required to contain passing test cases to model the expected behavior of the program and at least one failing test case that reveals the bug to be repaired. The process of NOPOL consists of three major phases. First, NOPOL employs angelic fix localization to identify expected values of a condition during the test execution. Second, runtime trace collection is used to collect variables and their actual values, including primitive data types and objected-oriented features (e.g., nullness checks), to serve as building blocks for patch generation. Third, NOPOL encodes these collected data into an instance of a Satisfiability Modulo Theory (SMT) problem; then a feasible solution to the SMT instance is translated back into a code patch. We evaluate NOPOL on 22 real-world bugs (16 bugs with buggy IF conditions and 6 bugs with missing preconditions) on two large open-source projects, namely Apache Commons Math and Apache Commons Lang. Empirical analysis on these bugs shows that our approach can effectively fix bugs with buggy IF conditions and missing preconditions. We illustrate the capabilities and limitations of NOPOL using case studies of real bug fixes.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Jifeng Xuan, Member, IEEE, Matias Martinez, Favio DeMarcoy, Maxime Clémenty, Sebastian Lamelas Marcotey, Thomas Durieuxy, Daniel Le Berre, and Martin Monperrus, Member, IEEE, “Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs”, IEEE Transactions on Software Engineering, 2017.

Transactional Behavior Verification in Business Process as a Service Configuration

Transactional Behavior Verification in Business Process as a Service Configuration

ABSTRACT:

Business Process as a Service (BPaaS) is an emerging type of cloud service that offers configurable and executable business processes to clients over the Internet. As BPaaS is still in early years of research, many open issues remain. Managing the configuration of BPaaS builds on areas such as software product lines and configurable business processes. The problem has concerns to consider from several perspectives, such as the different types of variable features, constraints between configuration options, and satisfying the requirements provided by the client. In our approach, we use temporal logic templates to elicit transactional requirements from clients that the configured service must adhere to. For formalizing constraints over configuration, feature models are used. To manage all these concerns during BPaaS configuration, we develop a structured process that applies formal methods while directing clients through specifying transactional requirements and selecting configurable features. The Binary Decision Diagram (BDD) analysis is then used to verify that the selected configurable features do not violate any constraints. Finally, model checking is applied to verify the configured service against the transactional requirement set. We demonstrate the feasibility of our approach with several validation scenarios and performance evaluations.

EXISTING SYSTEM:

  • Existing approaches in managing business process configuration ensure domain constraints over configuration choices, while allowing basic client requirements such as selected features or control flow variations. One area that has yet to receive research attention is ensuring both domain constraints and client transactional requirements during BPaaS configuration.
  • These requirements can include conditions for acceptable process commit or abortion, required recovery operations for key activities, or valid forms of process compensation, and are difficult to verify in a cloud based scenario where multiple stakeholders are involved.
  • A configuration method that ensures complex requirements within a feasible runtime will be able to provide service clients with increased trust for outsourcing potentially sensitive business operations.

DISADVANTAGES OF EXISTING SYSTEM:

  • The problem has concerns to consider from several perspectives, such as the different types of variable features, constraints between configuration options, and satisfying the requirements provided by the client.

PROPOSED SYSTEM:

  • We propose a three-step configuration and verification process which relies on a modeling paradigm. Such paradigm allows us to capture transactional requirements and subsequently verify them. Our approach is expressive and relatively easy to use by stakeholders, while at the same time being sufficiently rigorous to allow us to apply formal methods for verification.
  • We propose a BPaaS configuration process that applies formal methods to ensure that i) the configuration is valid with respect to provider domain constraints, and ii) the process satisfies transactional requirements drawn from the business rules of the client.
  • First, we provide an overview of the process which guides clients through BPaaS configuration, then we provide details on how Binary Decision Diagram (BDD) analysis and model checking are used at certain steps.

ADVANTAGES OF PROPOSED SYSTEM:

  • To the best of our knowledge, transactional requirements important to clients, such as those supported by our template set, are not yet supported by any business process configuration method, and this is one of the major contributions of this work compared to existing works.
  • This increases client trust that the service will behave in a manner consistent with internal business policies and requirements, without having to perform their own analysis of the service behavior.
  • Our BPaaS model enables configuration from numerous perspectives important to BPaaS clients, namely, activities, resources, and data objects.
  • Our configuration method aims to elicit and ensure complex transactional requirements from clients, by adapting the temporal logic template set.
  • It has the advantage of a reduced runtime when configuring services with many configuration options and values.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : ECLIPSE
  • Database : MYSQL

REFERENCE:

Scott Bourne, Claudia Szabo, Member, IEEE, Quan Z. Sheng, Member, IEEE, “Transactional Behavior Verification in Business Process as a Service Configuration”, IEEE TRANSACTIONS ON SERVICE COMPUTING 2017.

Collaborative Filtering Service Recommendation Based on a Novel Similarity Computation Method

Collaborative Filtering Service Recommendation Based on a Novel Similarity Computation Method

ABSTRACT:

Recently, collaborative filtering-based methods are widely used for service recommendation. QoS attribute value based collaborative filtering service recommendation includes two important steps. One is the similarity computation, and the other is the prediction for the QoS attribute value, which the user has not experienced. In some previous studies, the similarity computation methods and prediction methods are not accurate. The performances of some methods need to be improved. In this paper, we propose a ratio-based method to calculate the similarity. We can get the similarity between users or between items by comparing the attribute values directly. Based on our similarity computation method, we propose a new method to predict the unknown value. By comparing the values of a similar service and the current service that are invoked by common users, we can obtain the final prediction result. The performance of the proposed method is evaluated through a large data set of real web services. Experimental results show that our method obtains better prediction precision, lower mean absolute error (MAE) and faster computation time than various reference schemes considered.

EXISTING SYSTEM:

  • Presently, the Pearson correlation coefficient (PCC) and cosine (COS) methods are commonly applied to calculate the similarity.

DISADVANTAGES OF EXISTING SYSTEM:

  • Pearson correlation coefficient (PCC) and cosine (COS) methods have limited accuracy.
  • PCC method does not take the differences of QoS attributes values given by different users into account. Although the COS method can measure the angles of the vectors, which are composed by the users or services, it neglects the lengths of the vectors.

PROPOSED SYSTEM:

  • In this paper, we propose a new method to calculate the similarity.
  • Generally, the QoS attributes experienced by the user are given in the form of numerical values, and these values are non-negative. The similarity represents the degree of two objects’ consistency. We can use the ratio of two values to express the consistency.
  • The ratio of two attribute values which is the results of two users invoking the same item reflects the users’ consistency on this item, i.e., the single similarity. Summing up all the single similarities together and getting the average, we can obtain the final similarity between two users.

ADVANTAGES OF PROPOSED SYSTEM:

  • Our method is applicable to all kinds of QoS attributes which are given in the numerical values. However, some of the qualitative and subjective QoS attributes are expressed in non-numerical value, such as “very good”, “good”, and so on. According to certain rules, these evaluations can be transformed into numerical values, and then our method can be used.
  • The recommendation system can recommend appropriate service(s) to the user according to given conditions. Here, the specific condition given by a user may be constrained by multiple objectives
  • Save a lot of time and energy

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Xiaokun Wu, Bo Cheng, and Junliang Chen, “Collaborative Filtering Service Recommendation Based on a Novel Similarity Computation Method”, IEEE TRANSACTIONS ON SERVICE COMPUTING, VOL.10, NO.3, May-June 2017.

Improving Automated Bug Triaging with Specialized Topic Model

Improving Automated Bug Triaging with Specialized Topic Model

ABSTRACT:

Bug triaging refers to the process of assigning a bug to the most appropriate developer to fix. It becomes more and more difficult and complicated as the size of software and the number of developers increase. In this paper, we propose a new framework for bug triaging, which maps the words in the bug reports (i.e., the term space) to their corresponding topics (i.e., the topic space). We propose a specialized topic modeling algorithm named multi-feature topic model (MTM) which extends Latent Dirichlet Allocation (LDA) for bug triaging. MTM considers product and component information of bug reports to map the term space to the topic space. Finally, we propose an incremental learning method named TopicMiner which considers the topic distribution of a new bug report to assign an appropriate fixer based on the affinity of the fixer to the topics. We pair TopicMiner with MTM (TopicMiner MTM). We have evaluated our solution on 5 large bug report datasets including GCC, OpenOffice, Mozilla, Netbeans, and Eclipse containing a total of 227,278 bug reports. We show that TopicMinerMTM can achieve top-1 and top-5 prediction accuracies of 0.4831 – 0.6868, and 0.7686 – 0.9084, respectively. We also compare TopicMinerMTM with Bugzie, LDA-KL, SVM-LDA, LDA-Activity, and Yang et al.’s approach. The results show that TopicMinerMTM on average improves top-1 and top-5 prediction accuracies of Bugzie by 128.48% and 53.22%, LDA-KL by 262.91% and 105.97%, SVM-LDA by 205.89% and 110.48%, LDA-Activity by 377.60% and 176.32%, and Yang et al.’s approach by 59.88% and 13.70%, respectively.

EXISTING SYSTEM:

  • To aid in finding appropriate developers, automatic bug triaging approaches have been proposed in the existing. Many of these approaches use the vector space model (VSM) to represent a bug report, i.e., a bug report is treated as a vector of terms (words) and their counts. However, developers often use various terms to express the same meaning. The same term can also carry different meanings depending on the context. These synonymous and polysemous words cannot be captured by VSM.
  • Various topic modeling algorithms are proposed in the literature including Latent Semantic Indexing/Analysis (LSA), probabilistic LSA (pLSA), and Latent Dirichlet Allocation (LDA). Among the three, LDA is the most recently proposed and it addresses the limitations of LSA and pLSA.

DISADVANTAGES OF EXISTING SYSTEM:

  • LDA considers a document as a random mixture of latent topics, where a topic is a random mixture of terms.
  • One or few features can be only taken into consideration.
  • Lower accuracy.
  • More complex
  • More time taken

PROPOSED SYSTEM:

  • We extend LDA and propose a new topic model named multi-feature topic model (MTM) for the bug triaging problem. Since a bug report has multiple features (e.g., product affected by the bug, component affected by the bug, etc.), MTM considers the features of a bug report when it converts terms in the textual description of the report (i.e., texts in the summary and description fields of the report) to their corresponding topics in the topic space. Given a bug report with a particular feature combination (i.e., product component combination), MTM converts a word in the bug report, to a topic.
  • We refer to a feature as a categorical field in a bug report that a bug reporter can fill when the reporter submits a bug report. These fields include the product, component, reporter, priority, severity, OS, version, and platform fields. We exclude the natural language descriptions in the bug reports, which includes the contents of the summary and description fields, as the features since they are not categorical in nature.
  • In this paper, we use the product-component combination as the input feature combination, since product and component are two of the most important features that describe a bug. Given a bug report with a particular feature combination, MTM converts a term in the bug report to a topic by putting special emphasis on the appearances of the word in bug reports with the same feature combination, without ignoring the word appearances in all other bug reports.

ADVANTAGES OF PROPOSED SYSTEM:

  • MTM considers each combination of features as a random mixture of latent topics, where a topic is a random mixture of terms.
  • MTM is an extensible topic model, where one or more features can be taken into consideration.
  • We propose a new approach for bug triaging which leverages MTM. We take as input a training set of bug reports (whose fixers are known) and a new bug report whose fixer is to be predicted.
  • Our approach, named TopicMiner MTM computes the affinity of a developer to a new bug report, based on the reports that the developer fixed before. To do this, we compare the topics that appear in the new bug report with those in the old reports that the developer has fixed before.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : ECLIPSE
  • Database :         MYSQL

REFERENCE:

Xin Xia, Member, IEEE, David Lo, Member, IEEE, Ying Ding, Jafar M. Al-Kofahi, Tien N. Nguyen, Member, IEEE, Xinyu Wang, Member, IEEE, “Improving Automated Bug Triaging with Specialized Topic Model”, IEEE Transactions on Software Engineering, 2017.

A System for Profiling and Monitoring Database Access Patterns by Application Programs for Anomaly Detection

A System for Profiling and Monitoring Database Access Patterns by Application Programs for Anomaly Detection

ABSTRACT:

Database Management Systems (DBMSs) provide access control mechanisms that allow database administrators (DBAs) to grant application programs access privileges to databases. Though such mechanisms are powerful, in practice finer-grained access control mechanism tailored to the semantics of the data stored in the DMBS is required as a first class defense mechanism against smart attackers. Hence, custom written applications which access databases implement an additional layer of access control. Therefore, securing a database alone is not enough for such applications, as attackers aiming at stealing data can take advantage of vulnerabilities in the privileged applications and make these applications to issue malicious database queries. An access control mechanism can only prevent application programs from accessing the data to which the programs are not authorized, but it is unable to prevent misuse of the data to which application programs are authorized for access. Hence, we need a mechanism able to detect malicious behavior resulting from previously authorized applications. In this paper, we present the architecture of an anomaly detection mechanism, DetAnom, that aims to solve such problem. Our approach is based the analysis and profiling of the application in order to create a succinct representation of its interaction with the database. Such a profile keeps a signature for every submitted query and also the corresponding constraints that the application program must satisfy to submit the query. Later, in the detection phase, whenever the application issues a query, a module captures the query before it reaches the database and verifies the corresponding signature and constraints against the current context of the application. If there is a mismatch, the query is marked as anomalous. The main advantage of our anomaly detection mechanism is that, in order to build the application profiles, we need neither any previous knowledge of application vulnerabilities nor any example of possible attacks. As a result, our mechanism is able to protect the data from attacks tailored to database applications such as code modification attacks, SQL injections, and also from other data-centric attacks as well. We have implemented our mechanism with a software testing technique called concolic testing and the PostgreSQL DBMS. Experimental results show that our profiling technique is close to accurate, requires acceptable amount of time, and the detection mechanism incurs low run-time overhead.

EXISTING SYSTEM:

  • Several approaches have been proposed to protect databases against malicious application programs. DIDAFIT is an intrusion detection system that works at the application level. Like our system, DIDAFIT works in two phases: training phase and detection phase. During the training phase, database logs are analyzed to generate fingerprints of the queries found in the log. Fingerprints are regular expressions of queries with constants in the WHERE clause replaced by place-holders that reflect the data types of the constants. During the detection phase, input queries are checked against such fingerprints. Queries that match some expression in the profiles are considered benign, and anomalous otherwise.
  • The approaches proposed by Bertino et al. and Valeur et al. also analyze training logs for creating profiles of queries. Therefore they have the same drawbacks mentioned earlier. These approaches focus on the detection of web-based attacks, like SQL Injection and Cross-Site Scripting (XSS) attacks, and fail to detect other attacks performed through application programs, e.g., code modification attacks.

DISADVANTAGES OF EXISTING SYSTEM:

  • First, insiders are allowed to access resources, such as data and computer systems, and services inside the organization networks as they possess valid credentials.
  • Second, the actions of insiders originate at a trusted domain within the network, and thus are not subject to thorough security checks in the same way as external actions are.
  • Third, insiders are often highly trained computer experts, who have knowledge about the internal configuration of the network and the security and auditing control deployed. Therefore, they may be able to circumvent conventional security mechanisms.
  • DIDAFIT has however some major drawbacks. First, the system relies only on logs to create program profiles. There is therefore no guarantee that the log would contain all legitimate queries. Another problem is that DIDAFIT does not take into account the control flow and data flow of the program, i.e., the algorithm neither checks the correct order of the queries, nor the constraints that have to be verified for a query to be executed.

PROPOSED SYSTEM:

  • First the anomaly profile algorithm is used to create runtime behavior profile of the custom written application. The anomaly detection algorithm is used to detect the anomalies presented in database queries that are originated from trusted network or internal network, from application profile.
  • In this paper, we propose DetAnom, an anomaly detection mechanism able to identify malicious database transactions that addresses the above requirements. DetAnom consists of two phases: the profile creation phase and the anomaly detection phase.
  • In the first phase, we create a profile of the application program that can succinctly represent the application’s normal behavior in terms of its interaction (i.e., submission of SQL queries) with the database. For each query, we create a signature and also capture the corresponding preconditions that the application program must satisfy to submit the query. Note that an application program may execute different query sequences depending on different values of the input parameters. Hence, the profile of the application needs to consider all possible execution paths that lead to interaction with the database. Each query in the application belongs to one of these paths and has a set of preconditions (i.e., constraints) in order to be issued.
  • Later in the anomaly detection phase, whenever the application issues a query, the corresponding query signature and constraints are checked against the current context of the application. If there is a mismatch, the query is considered as anomalous

ADVANTAGES OF PROPOSED SYSTEM:

  • The main advantage of our anomaly detection mechanism is that we do not need any knowledge about possible attacks to build the application profiles.
  • To the best of our knowledge, our approach is the first using software testing techniques for creating execution profiles of application programs for the purpose of detecting execution anomalies at run-time. Such anomalies may be indicative of application program tampering. Notice that our approach is complementary to techniques for static analysis.
  • Such techniques aim at analyzing programs to detect bugs that can be exploited by attacks at run-time, such as buffer vulnerabilities.
  • Our approach aims at preventing malicious changes to programs, after the completion of the static analysis, by insiders who have the ability to modify the application source code or the application binary.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Lorenzo Bossi, Elisa Bertino, Fellow, IEEE, and Syed Rafiul Hussain, Member, IEEE, “A System for Profiling and Monitoring Database Access Patterns by Application Programs for Anomaly Detection”, IEEE Transactions on Software Engineering, 2017.

Two-Stage Friend Recommendation Based on Network Alignment and Series-Expansion of Probabilistic Topic Model

Two-Stage Friend Recommendation Based on Network Alignment and Series-Expansion of Probabilistic Topic Model

ABSTRACT:

Precise friend recommendation is an important problem in social media. Although most social websites provide some kinds of auto friend searching functions, their accuracies are not satisfactory. In this paper, we propose a more precise auto friend recommendation method with two stages. In the first stage, by utilizing the information of the relationship between texts and users, as well as the friendship information between users, we align different social networks and choose some “possible friends”. In the second stage, with the relationship between image features and users we build a topic model to further refine the recommendation results. Because some traditional methods such as variational inference and Gibbs sampling have their limitations in dealing with our problem, we develop a novel method to find out the solution of the topic model based on series expansion. We conduct experiments on the Flickr dataset to show that the proposed algorithm recommends friends more precisely and faster than traditional methods.

EXISTING SYSTEM:

  • Existing multi-stage recommendations are usually applied to find some patterns of users or items.
  • In an existing system, a two-stage mobile recommendation is proposed to help users find the correct events. The first stage clusters people according to their profile similarity and the second stage discovers the event participating pattern.
  • The another existing system designs the first stage to find some related resources that one user requires, and the second stage is used to find some patterns that the user might prefer from the previous stage for further recommendation. Both the systems can handle the cold-start problem well but do not consider much about the cross-domain problem.

DISADVANTAGES OF EXISTING SYSTEM:

  • Traditional friend recommendations widely applied by Facebook and Twitter are often based on common friends and similar profiles such as having the same hobbies or studying in the same fields. These methods usually provide a long ranked possible friend list, but the recommendation precision is usually not satisfactory due to its complexity.
  • Co-clustering method lacks the ability to tell the intimacy distance between two individuals exactly but only to group people roughly with similar properties, and thus cannot make precise recommendation.
  • The presence of so many unknown variables not only greatly increases the complexity of the algorithm, but also leads to other problems such as over-fitting or redundancies.

PROPOSED SYSTEM:

  • In this paper, we approach this recommendation problem in a different way by utilizing the multi-domain information in different stages for a more precise recommendation.
  • In the first stage, based on the correlation of different networks, we align the tag-similarity network to friend network to obtain a possible friend list. Specifically, we consider each user as one node in a graph, and we crawl the uploaded tags from each user and calculate the tag similarity between any two users as the edges to form a tag-similarity network.
  • In the second stage, to overcome the problem that the mass election considering only the tag information might not be precise, we build a topic model to illustrate the relationship between user’s friend making behaviour and the image features they have uploaded. This stage refines the list obtained in the first stage. The main reason for applying a topic model in our second stage lies in the fact that the topic model has the ability to tell on what probability a user would prefer a photo/item/friends.

ADVANTAGES OF PROPOSED SYSTEM:

  • Compared with some previously cross-domain topic models, our model is more compact with less parameters, which leads to some computational convenience.
  • Our proposed method provides a way to describe the whole distribution of the social network, to perform a better recommendation.
  • As far as we know,this is the first time to solve a topic model from the aspect of integral series expansion. We also make comprehensive experiments to show the effectiveness of our method.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Shangrong Huang, Jian Zhang, Dan Schonfeldy, Lei Wangz, and Xian-Sheng Hua, “Two-Stage Friend Recommendation Based on Network Alignment and Series-Expansion of Probabilistic Topic Model”, IEEE Transactions on Multimedia, 2017.

Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement

Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement

ABSTRACT:

Social image tag refinement, which aims to improve tag quality by automatically completing the missing tags and rectifying the noise-corrupted ones, is an essential component for social image search. Conventional approaches mainly focus on exploring the visual and tag information, without considering the user information, which often reveals important hints on the (in)correct tags of social images. Towards this end, we propose a novel tri-clustered tensor completion framework to collaboratively explore these three kinds of information to improve the performance of social image tag refinement. Specifically, the inter-relations among users, images and tags are modeled by a tensor, and the intra-relations between users, images and tags are explored by three regularizations respectively. To address the challenges of the super-sparse and large-scale tensor factorization that demands expensive computing and memory cost, we propose a novel tri-clustering method to divide the tensor into some sub-tensors by simultaneously clustering users, images and tags into a bunch of tri-clusters. And then we investigate two strategies to complete these sub-tensors by considering (in) dependence between the sub-tensors. Experimental results on a real-world social image database demonstrate the superiority of the proposed method compared with the state-of-the-art methods.

EXISTING SYSTEM:

  • The prior works related to image tag refinement mainly focus on exploring semantic correlation among tags.
  • Jin et al. identified and filtered out the weakly irrelevant annotated tags by exploring tag semantic correlation on WordNet.
  • Xu et al. proposed a tag refinement scheme based on tag similarity and relevance by using LDA to mine latent topics.
  • Recently, matrix completion based on low-rank approximation has been explored, which refers to a process of inferring missing entries from a small part of the observed entries in the original matrix between the dyad data (such as word-document in text mining, user-item in recommendation system, and image feature in image processing).
  • Inspired by matrix completion, several approaches have been proposed, to leverage a small number of observed noisy tags to simultaneously recover the missing tags, remove the noisy tags, and even re-rank the complete tag list. These methods have achieved the impressive performance in tag refinement.

DISADVANTAGES OF EXISTING SYSTEM:

  • All the existing aforementioned methods only explore the visual and tag information, without considering the user information (e.g., user interests and backgrounds) that usually reveals important hints on the (in)correct tags of social images. Therefore, these above methods lacking the consideration of user information cannot achieve satisfied performance when the visual content and label taxonomy (e.g. WordNet taxonomy) are inconsistent.
  • It requires several selected ”negative” tags before ranking, which will bring in some incorrect correlations.
  • There are several problems in the tensor completion for real-world applications. First, the dimension of the constructed tensor is usually extremely large. The process of tensor completion generates large temporal matrices and tensors, which requires expensive computing and memory cost.
  • Existing works mainly explore parallel solutions to achieve low complexity and reduce memory cost.
  • Second, the associated 3rd-order tensor is usually very sparse, because the number of observed elements only accounts for a small ratio compared to the size of the tensor.

PROPOSED SYSTEM:

  • We explore the user information to assist social image tag refinement, especially for those images with context information, e.g., geo-related tags, event tags, etc.
  • To address the above issues, we propose a novel tri-clustered tensor completion (TTC) framework for social image tag refinement.
  • First, we utilize the clustering method to divide the original tensor into several sub-tensors to reduce the computing and memory cost. As to the clustering problem, existing approaches use the associated matrix to model the relationships between two types of data, and then cluster the rows and columns of this matrix simultaneously into co-clusters, which is known as the co-clustering.
  • Motivated by this, we propose an efficient tri-clustering method to identify the block structures in the rows, columns and tubes. Specifically, the proposed tri-clustering divides the image-tag-user associated tensor into several subtensors based on the explicit associations and latent structure of the tensor. Second, to handle the supersparsity problem of tensor, we select the denser subtensors, and then complete the selected sub-tensors.

ADVANTAGES OF PROPOSED SYSTEM:

  • The results of social image tag refinement on a real-world social image database demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
  • A novel tri-clustered tensor completion (TTC) framework for social image tag refinement is proposed by solving the low-rank approximation problem of the image-tag-user associated tensor.
  • The tri-clustering method is proposed to divide the tensor into several sub-tensors, in order to overcome the challenges of large-scale tensor factorization.
  • The sub-tensor completion method is proposed to complete the denser sub-tensors, in order to effectively solve the super-sparse tensor completion problem.
  • Two variants of TTC are proposed respectively, by considering the two assumptions whether or not the sub-tensors are independent of each other.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : ECLIPSE
  • Database : MYSQL

REFERENCE:

Jinhui Tang, Xiangbo Shu, Guo-Jun Qi, Zechao Li, Meng Wang, Shuicheng Yan, and Ramesh Jain, Life Fellow, IEEE, “Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

Image Re-ranking based on Topic Diversity

Image Re-ranking based on Topic Diversity

ABSTRACT:

Social media sharing websites allow users to annotate images with free tags, which significantly contribute to the development of the web image retrieval. Tag-based image search is an important method to find images shared by users in social networks. However, how to make the top ranked result relevant and with diversity is challenging. In this paper, we propose a topic diverse ranking approach for tag-based image retrieval with the consideration of promoting the topic coverage performance. First, we construct a tag graph based on the similarity between each tag. Then community detection method is conducted to mine the topic community of each tag. After that, inter-community and intra-community ranking are introduced to obtain the final retrieved results. In the inter-community ranking process, an adaptive random walk model is employed to rank the community based on the multi-information of each topic community. Besides, we build an inverted index structure for images to accelerate the searching process. Experimental results on Flickr dataset and NUS-Wide datasets show the effectiveness of the proposed approach.

EXISTING SYSTEM:

  • Currently, image clustering and duplicate removal are the major approaches in settling the diversity problem. However, most of the literature regards the diversity problem as to promote the visual diversity performance, but the promotion of the semantic coverage is often ignored.
  • To diversify the top ranked search results from the semantic aspect, the topic community belongs to each image should be considered.
  • Dang-Nguyen et al. first propose a clustering algorithm to obtain a topic tree, and then sort topics according to the number of images in the topic. In each cluster, the image uploaded by the user who has highest visual score is selected as the top ranked image. The second image is the one which has the largest distance to the first image. The third image is chosen as the image with the largest distance to both two previous images, and so on.
  • Most papers consider the diversity from visual perspective and achieve it by applying clustering on visual features

DISADVANTAGES OF EXISTING SYSTEM:

  • Tag mismatch
  • Query ambiguity
  • Most of the above literatures view the diversity problem as to promote the visual diversity but not the topic coverage.

PROPOSED SYSTEM:

  • In this paper, we focus on the topic diversity. We first group all the tags in the initial retrieval image list to make the tags with similar semantic be the same cluster, then assign images into different clusters. The images within the same cluster are viewed as the ones with similar semantics. After ranking the clusters and images in each cluster, we select one image from each cluster to achieving our semantic diversity.
  • In this paper, we propose to construct the tag graph and mine the topic community to diversify the semantic information of the retrieval results. The contributions of this paper are summarized as follows:
  • We propose a topic diverse ranking approach considering the topic coverage of the retrieved images. The inter-community ranking method and intra-community ranking methods are proposed to achieve a good trade-off between the diversity and relevance performance.
  • The tag graph construction based on each tag’s word vector and community mining approach are employed in our approach to detect topic community. The mined community can represent each sub-topic under the given query. Besides, in order to represent the relationship of tags better, we train the word vector of each tag based on the English Wikipedia corpus with the model word2vec.
  • We rank each mined community according to their relevance level to the query. In the inter-community ranking process, an adaptive random walk model is employed to accomplish the ranking based on the relevance of each community with respect to the query, pair-wise similarity between each community, and the image number in each community.

ADVANTAGES OF PROPOSED SYSTEM:

  • Good trade-off between the diversity and relevance performance.
  • With the adaptive random walk model, the community that possesses the bigger semantic relevance value with the query and larger confidence value will be ranked higher.
  • To diversify the top ranked retrieval results
  • Computes the similarity between the user-oriented image set and query based on the co-occurrence tag mechanism.
  • We sort the communities based on relevance scores obtained by random walk.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : ECLIPSE
  • Database : MYSQL

REFERENCE:

Xueming Qian, Member, IEEE, Dan Lu, Yaxiong Wang, Li Zhu, Yuan Yan Tang, Fellow, IEEE, and Meng Wang, “Image Re-ranking based on Topic Diversity”,  IEEE Transactions on Image Processing, 2017

Automatic Generation of Social Event Storyboard from Image Click-through Data

Automatic Generation of Social Event Storyboard from Image Click-through Data

ABSTRACT:

Recent studies have shown that a noticeable percentage of web search traffic is about social events. While traditional websites can only show human-edited events, in this paper we present a novel system to automatically detect events from search log data and generate storyboards where the events are arranged chronologically. We chose image search log as the resource for event mining, as search logs can directly reflect people’s interests. To discover events from log data, we present a Smooth Nonnegative Matrix Factorization framework (SNMF) which combines the information of query semantics, temporal correlations, search logs and time continuity. Moreover, we consider the time factor an important element since different events will develop in different time tendencies. In addition, to provide a media-rich and visually appealing storyboard, each event is associated with a set of representative photos arranged along a timeline. These relevant photos are automatically selected from image search results by analyzing image content features. We use celebrities as our test domain, which takes a large percentage of image search traffics. Experiments consisting of web search traffic on 200 celebrities, for a period of six months, show very encouraging results compared with handcrafted editorial storyboards.

EXISTING SYSTEM:

  • The most related research topics to this paper are event/topic detection from Web. There have been quite a few works that examine related directions. The most typical data sources for event/topic mining are news articles and weblogs. Various statistical methods have been proposed to group documents sharing the same stories. Temporal analysis has also been involved to recover the development trend of an event.
  • The representative work for event/topic detection is the DARPA-sponsored research program called TDT (topic detection and tracking), which focus on discovering events from streams of news documents. With the development of Web 2.0, weblogs have become another data source for event detection. Some of these research efforts develop new statistical methods and some others focused on recovering the temporal structure of events.

DISADVANTAGES OF EXISTING SYSTEM:

  • First, the coverage of human center domains is small. Typically, one website only focuses on celebrities in one or two domains (most of them are entertainment and sports), and to the best of our knowledge, there are no general services yet for tracing celebrities over various domains.
  • Second, these existing services are not scalable. Even for specific domains, only a few top stars are covered1, as the editing effort to cover more celebrities is not financially viable.
  • Third, reported event news may be biased by editors’ interests.
  • Discovering events from a search log is not a trivial task.
  • Existing work on log event mining mostly focus on merging similar queries into groups, and investigating whether these groups are related to semantic events like “Japan Earthquake” or “American Idol”. Basically, their goals are to distinguish salient topics from noisy queries. Directly applying their approaches will fail as the discovered topics are more likely related to vast and common topics, which may be familiar to most users.

PROPOSED SYSTEM:

  • In this paper, we aim to build a scalable and unbiased solution to automatically detect social events especially related to celebrities along a timeline. This could be an attractive supplement to enrich the existing event description in search result pages.
  • In this paper, we will focus on those events happening at a certain time favored by users as our celebrity-related social events. we would like to detect those more interesting social events to entertain users and fit their browsing taste, which could be supplementary to some current knowledge bases.
  • A novel approach is proposed in this paper using Smooth Nonnegative Matrix Factorization (SNMF) for event detection, by fully leveraging information from query semantics, temporal correlations, and search log records. We use the SNMF method rather than the normal NMF method or other MF method to guarantee that the weights for each topic are non-negative and consider the time factor for event development at the same time.
  • The basic idea is two-fold: 1) promote event queries through by strengthening their connections based on all available features; 2) differentiate events from popular queries according to their temporal characteristics.

ADVANTAGES OF PROPOSED SYSTEM:

  • To provide a comprehensive and vivid storyboard, in this paper, we also introduce an automatic way to attach a set of relevant photos to each piece of event news.
  • We propose a novel framework to detect interesting events by mining users’ search log data. The framework consists of two components, i.e., Smooth Non-Negative Matrix Factorization event detection and representative event related image photo selection
  • We have conducted comprehensive evaluations on largescale real-world click through data to validate the effectiveness.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Jun Xu, Tao Mei, Senior Member, IEEE, Rui Cai, Member, IEEE, Houqiang Li, Senior Member, IEEE and Yong Rui, Fellow, IEEE, “Automatic Generation of Social Event Storyboard from Image Click-through Data”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017.