Mining Competitors from Large Unstructured Datasets

Mining Competitors from Large Unstructured Datasets

ABSTRACT:

In any competitive business, success is based on the ability to make an item more appealing to customers than the competition. A number of questions arise in the context of this task: how do we formalize and quantify the competitiveness between two items? Who are the main competitors of a given item? What are the features of an item that most affect its competitiveness? Despite the impact and relevance of this problem to many domains, only a limited amount of work has been devoted toward an effective solution. In this paper, we present a formal definition of the competitiveness between two items, based on the market segments that they can both cover. Our evaluation of competitiveness utilizes customer reviews, an abundant source of information that is available in a wide range of domains. We present efficient methods for evaluating competitiveness in large review datasets and address the natural problem of finding the top-k competitors of a given item. Finally, we evaluate the quality of our results and the scalability of our approach using multiple datasets from different domains.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • The management literature is rich with works that focus on how managers can manually identify competitors. Some of these works model competitor identification as a mental categorization process in which managers develop mental representations of competitors and use them to classify candidate firms. Other manual categorization methods are based on market- and resource-based similarities between a firm and candidate competitors.
  • Zheng et al. identify key competitive measures (e.g. market share, share of wallet) and showed how a firm can infer the values of these measures for its competitors by mining (i) its own detailed customer transaction data and (ii) aggregate data for each competitor.

DISADVANTAGES OF EXISTING SYSTEM:

  • The frequency of textual comparative evidence can vary greatly across domains. For example, when comparing brand names at the firm level (e.g. “Google vs Yahoo” or “Sony vs Panasonic”), it is indeed likely that comparative patterns can be found by simply querying the web. However, it is easy to identify mainstream domains where such evidence is extremely scarce, such as shoes, jewelery, hotels, restaurants, and furniture.
  • Existing approach is not appropriate for evaluating the competitiveness between any two items or firms in a given market. Instead, the authors assume that the set of competitors is given and, thus, their goal is to compute the value of the chosen measures for each competitor. In addition, the dependency on transactional data is a limitation we do not have.
  • The applicability of such approaches is greatly limited

PROPOSED SYSTEM:

  • We propose a new formalization of the competitiveness between two items, based on the market segments that they can both cover.
  • We describe a method for computing all the segments in a given market based on mining large review datasets. This method allows us to operationalize our definition of competitiveness and address the problem of finding the top-k competitors of an item in any given market. As we show in our work, this problem presents significant computational challenges, especially in the presence of large datasets with hundreds or thousands of items, such as those that are often found in mainstream domains. We address these challenges via a highly scalable framework for top-k computation, including an efficient evaluation algorithm and an appropriate index.

ADVANTAGES OF PROPOSED SYSTEM:

  • To the best of our knowledge, our work is the first to address the evaluation of competitiveness via the analysis of large unstructured datasets, without the need for direct comparative evidence.
  • A formal definition of the competitiveness between two items, based on their appeal to the various customer segments in their market. Our approach overcomes the reliance of previous work on scarce comparative evidence mined from text.
  • A formal methodology for the identification of the different types of customers in a given market, as well as for the estimation of the percentage of customers that belong to each type.
  • A highly scalable framework for finding the top-k competitors of a given item in very large datasets.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

George Valkanas, Theodoros Lappas, and Dimitrios Gunopulos, “Mining Competitors from Large Unstructured Datasets”, IEEE Transactions on Knowledge and Data Engineering, 2017.

l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items

l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items

ABSTRACT:

We develop a novel framework, named as l-injection, to address the sparsity problem of recommender systems. By carefully injecting low values to a selected set of unrated user-item pairs in a user-item matrix, we demonstrate that top-N recommendation accuracies of various collaborative filtering (CF) techniques can be significantly and consistently improved. We first adopt the notion of pre-use preferences of users toward a vast amount of unrated items. Using this notion, we identify uninteresting items that have not been rated yet but are likely to receive low ratings from users, and selectively impute them as low values. As our proposed approach is method-agnostic, it can be easily applied to a variety of CF algorithms. Through comprehensive experiments with three real-life datasets (e.g., Movielens, Ciao, and Watcha), we demonstrate that our solution consistently and universally enhances the accuracies of existing CF algorithms (e.g., item-based CF, SVD-based CF, and SVD++) by 2.5 to 5 times on average. Furthermore, our solution improves the running time of those CF methods by 1.2 to 2.3 times when its setting produces the best accuracy. The datasets and codes that we used in the experiments are available at: https://goo.gl/KUrmip.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Among existing solutions in recommender systems RS, in particular, collaborative filtering (CF) methods have been shown to be widely effective. Based on the past behavior of users such as explicit user ratings and implicit click logs, CF methods exploit the similarities between users’ behavior patterns.
  • Most CF methods, despite their wide adoption in practice, suffer from low accuracy if most users rate only a few items (thus producing a very sparse rating matrix), called the data sparsity problem. This is because the number of unrated items is significantly more than that of rated items.
  • To address this problem, some existing work attempted to infer users’ ratings on unrated items based on additional information such as clicks and bookmarks

DISADVANTAGES OF EXISTING SYSTEM:

  • These works require an overhead of collecting extra data, which itself may have another data sparsity problem.
  • 0-injection simply considers all uninteresting items as zero, it may neglect to the characteristics of users or items. In contrast, l-injection not only maximizes the impact of filling missing ratings but also considers the characteristics of users and items, by imputing uninteresting items with low peruse preferences.

PROPOSED SYSTEM:

  • In this work, we develop a more general l-injection to infer different user preferences for uninteresting items for users, and show that l-injection mostly outperforms 0-injection.
  • The proposed l-injection approach can improve the accuracy of top-N recommendation based on two strategies: (1) preventing uninteresting items from being included in the top-N recommendation, and (2) exploiting both uninteresting and rated items to predict the relative preferences of unrated items more accurately.
  • With the first strategy, because users are aware of the existence of uninteresting items but do not like them, such uninteresting items are likely to be false positives if included in top-N recommendation. Therefore, it is effective to exclude uninteresting items from top-N recommendation results.
  • Next, the second strategy can be interpreted using the concept of typical memory based CF methods.

ADVANTAGES OF PROPOSED SYSTEM:

  • We introduce a new notion of uninteresting items, and classify user preferences into pre-use and post-use preferences to identify uninteresting items.
  • We propose to identify uninteresting items via peruse preferences by solving the OCCF problem and show its implications and effectiveness.
  • We propose low-value injection (called l-injection) to improve the accuracy of top-N recommendation in existing CF algorithms.
  • While existing CF methods only employ user preferences on rated items, the proposed approach employs both peruse and post-use preferences. Specifically, the proposed approach first infers pre-use preferences of unrated items and identifies uninteresting items.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Jongwuk Lee Won-Seok Hwang Juan Parc Youngnam Lee Sang-Wook Kim Dongwon Lee, “l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items”, IEEE Transactions on Knowledge and Data Engineering, 2017

Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach

Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach

ABSTRACT:

As both social network structure and strength of influence between individuals evolve constantly, it requires to track the influential nodes under a dynamic setting. To address this problem, we explore the Influential Node Tracking (INT) problem as an extension to the traditional Influence Maximization problem (IM) under dynamic social networks. While Influence Maximization problem aims at identifying a set of k nodes to maximize the joint influence under one static network, INT problem focuses on tracking a set of influential nodes that keeps maximizing the influence as the network evolves. Utilizing the smoothness of the evolution of the network structure, we propose an efficient algorithm, Upper Bound Interchange Greedy (UBI) and a variant, UBI+. Instead of constructing the seed set from the ground, we start from the influential seed set we find previously and implement node replacement to improve the influence coverage. Furthermore, by using a fast update method by calculating the marginal gain of nodes, our algorithm can scale to dynamic social networks with millions of nodes. Empirical experiments on three real large-scale dynamic social networks show that our UBI and its variants, UBI+ achieves better performance in terms of both influence coverage and running time.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Zhuang et al. study the influence maximization under dynamic networks where the changes can be only detected by periodically probing some nodes. Their goal then is to probe a subset of nodes in a social network so that the actual influence diffusion process in the network can be best uncovered with the probing nodes.
  • Zhou et al. have achieved further acceleration by incorporating upper bound on the influence function.

DISADVANTAGES OF EXISTING SYSTEM:

  • Traditional algorithms for Influence Maximization become inefficient under this situation as they fail to consider the connection between social networks at different time and have to solve many Influence Maximization problems independently for social network at each time.
  • All the previous methods aim to discover the influential nodes under one static network.

PROPOSED SYSTEM:

  • In this paper, we propose an efficient algorithm, Upper Bound Interchange Greedy (UBI), to tackle Influence Maximization problem under dynamic social network, which we term as Influential Node Tracking (INT) problem. That is to track a set of influential nodes which maximize the influence under the social network at any time.
  • The main idea of our UBI algorithm is to leverage the similarity of social networks near in time and directly discover the influential nodes based on the seed set found for previous social network instead of constructing the solution from an empty set. As similarity in network structure leads to similar set of nodes that maximize the influence.
  • In our UBI algorithm, we start from the seed set maximizing the influence under previous social network. Then we change the nodes in the existing set one by one in order to increase the influence under the current social network. As the optimal seed set differs only in a small number of nodes, a few rounds of node exchanges are enough to discover a seed set with large joint influence under current social network.

ADVANTAGES OF PROPOSED SYSTEM:

  • We explore the Influential Node Tracking (INT) problem as an extension to the traditional Influence Maximization problem to maximize the influence coverage under a dynamic social network.
  • We propose an efficient algorithm, Upper Bound Interchange (UBI) to solve the INT problem. Our algorithm achieves comparable results as hill-climbing greedy algorithm approximation is guaranteed. The algorithm has the time complexity of O(kn), and the space complexity of O(n), where n is the number of nodes and k is the size of the seed set.
  • We propose an algorithm UBI+, based on UBI, that improves the computation of node replacement upper bound.
  • We evaluate the performance on large-scale real social network. The experiment results confirm our theoretical findings and show that our UBI and UBI+ algorithm achieve better performance of both influence coverage and running time.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Guojie Song, Yuanhao Li, Xiaodong Chen, and Xinran He, “Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach”, IEEE Transactions on Knowledge and Data Engineering, 2017.