Personal Web Revisitation by Context and Content Keywords with Relevance Feedback

Personal Web Revisitation by Context and Content Keywords with Relevance Feedback

ABSTRACT:

Getting back to previously viewed web pages is a common yet uneasy task for users due to the large volume of personally accessed information on the web. This paper leverages human’s natural recall process of using episodic and semantic memory cues to facilitate recall, and presents a personal web revisitation technique called WebPagePrev through context and content keywords. Underlying techniques for context and content memories’ acquisition, storage, decay, and utilization for page re-finding are discussed. A relevance feedback mechanism is also involved to tailor to individual’s memory strength and revisitation habits. Our 6-month user study shows that: (1) Compared with the existing web revisitation tool Memento, History List Searching method, and Search Engine method, the proposed WebPagePrev delivers the best re-finding quality in finding rate (92.10%), average F1-measure (0.4318) and average rank error (0.3145). (2) Our dynamic management of context and content memories including decay and reinforcement strategy can mimic users’ retrieval and recall mechanism. With relevance feedback, the finding rate of WebPagePrev increases by 9.82%, average F1-measure increases by 47.09%, and average rank error decreases by 19.44% compared to stable memory management strategy. Among time, location, and activity context factors in WebPagePrev, activity is the best recall cue, and context+content based re-finding delivers the best performance, compared to context based re-finding and content based re-finding.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • In the literature, a number of techniques and tools like bookmarks, history tools, search engines, metadata annotation and exploitation, and contextual recall systems have been developed to support personal web revisitation.
  • The most closely related work of this study is Memento system, which unifies context and content to aid web revisitation. It defined the context of a web page as other pages in the browsing session that immediately precede or follow the current page, and then extracted topic-phrases from these browsed pages based on the Wikipedia topic list.
  • Other closely related work enabled users to search for contextually related activities (e.g., time, location, concurrent activities, meetings, music playing, interrupting phone call, or even other files or web sites that were open at the same time), and find a target piece of information (often not semantically related) when that context was on. This body of research emphasizes episodic context cues in page recall.

DISADVANTAGES OF EXISTING SYSTEM:

  • Uneasy task for users
  • Large Volume of data, makes more complex
  • Poor finding rate
  • Low F1-measure

PROPOSED SYSTEM:

  • Preparation for web revisitation. When a user accesses a web page, which is of potential to be revisited later by the user (i.e., page access time is over a threshold), the context acquisition and management module captures the current access context (i.e., time, location, activities inferred from the currently running computer programs) into a probabilistic context tree. Meanwhile, the content extraction and management module performs the unigrambased extraction from the displayed page segments and obtains a list of probabilistic content terms.
  • The probabilities of acquired context instances and extracted content terms reflect how likely the user will refer to them as memory cues to get back to the previously focused page.
  • Web revisitation. Later, when a user requests to get back to a previously focused page through context and/or content keywords, the re-access by context keywords module and re-access by content keywords module search the probabilistic context tree repository and probabilistic term list repository, respectively.

ADVANTAGES OF PROPOSED SYSTEM:

  • This paper explores how to leverage our natural recall process of using episodic and semantic memory cues to facilitate personal web revisitation. Considering the differences of users in memorizing previous access context and page content cues, a relevance feedback mechanism is involved to enhance personal web revisitation performance.
  • We present a personal web revisitation technique, called WebPagePrev, that allows users to get back to their previously focused pages through access context and page content keywords. Underlying techniques for context and content memories’ acquisition, storage, and utilization for web page recall are discussed.
  • Dynamic tuning strategies to tailor to individual’s memorization strength and recall habits based on relevance feedback (e.g., weight preference calculation, decay rate adjustment, etc.) are developed for performance improvement.
  • We evaluate the effectiveness of the proposed technique WebPagePrev, and report the findings (e.g., the importance of context and content factors) in web revisitation.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Li Jin, Gangli Liu, Chaokun Wang and Ling Feng, Senior Member, IEEE, “Personal Web Revisitation by Context and Content Keywords with Relevance Feedback”, IEEE Transactions on Knowledge and Data Engineering, 2017.

 

Modeling Information Diffusion over Social Networks for Temporal Dynamic Prediction

Modeling Information Diffusion over Social Networks for Temporal Dynamic Prediction

Modeling Information Diffusion over Social Networks for Temporal Dynamic Prediction

ABSTRACT:

How to model the process of information diffusion in social networks is a critical research task. Although numerous attempts have been made for this study, few of them can simulate and predict the temporal dynamics of the diffusion process. To address this problem, we propose a novel information diffusion model (GT model), which considers the users in network as intelligent agents. The agent jointly considers all his interacting neighbors and calculates the payoffs for his different choices to make strategic decision. We introduce the time factor into the user payoff, enabling the GT model to not only predict the behavior of a user but also to predict when he will perform the behavior. Both the global influence and social influence are explored in the time dependent payoff calculation, where a new social influence representation method is designed to fully capture the temporal dynamic properties of social influence between users. Experimental results on Sina Weibo and Flickr validate the effectiveness of our methods.

EXISTING SYSTEM:

  • In “Scalable influence maximization for prevalent viral marketing in large-scale social networks”: Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in large-scale online social networks.
  • Prior solutions, such as the greedy algorithm of Kempe et al. (2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good performance on influence spreads.
  • In existing system, the authors design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. The system has a simple tunable parameter for users to control the balance between the running time and the influence spread of the algorithm.

DISADVANTAGES OF EXISTING SYSTEM:

  • How to model the process of information diffusion in social networks is a critical research task.
  • Although numerous attempts have been made for this study, few of them can simulate and predict the temporal dynamics of the diffusion process.

PROPOSED SYSTEM:

  • In this paper, we propose a novel information diffusion model (GT model) for temporal dynamic prediction. In contrast to traditional theory-centric models, the GT model regards the users in the network as intelligent agents. It can capture both the behavior of individual agent and the strategic interactions among these agents. By introducing the time-dependent payoffs, the GT model is able to predict the temporal dynamics of the information diffusion process. Different from most data-centric models, the GT model can not only predict whether a user will perform a behavior but also can predict when he will perform it.
  • In the proposed GT model, the diffusion process unfolds in discrete time-steps t, and begins from a given initial active user set. When a user v observes a piece of information at time t, he calculates his payoffs for different choices depending on his neighbors’ status so as to make strategic decision.

ADVANTAGES OF PROPOSED SYSTEM:

  • We propose a novel information diffusion model (GT model), where, between different choices (behaviors), the user jointly considers all his interacting neighbors’ choices to make strategic decisions that maximizes his payoff.
  • We propose a time-dependent user payoff calculation method in the GT model by exploring both the global influence and social influence.
  • We propose a new social influence representation method, which can accurately capture the temporal dynamic properties of social influence between users.
  • We conduct experiments on datasets. The comparison results with closely related work indicate the superiority of the proposed GT model.

SYSTEM ARCHITECTURE:

Modeling Information Diffusion over Social Networks

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool :         Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Dong Li, Zhiming Xu, Yishu Luo, Sheng Li, Anika Gupta_Katia Sycara, Shengmei Luo, Lei Hu, Hong Chen, “Modeling Information Diffusion over Social Networks for Temporal Dynamic Prediction”, IEEE Transactions on Knowledge and Data Engineering, 2017.

Mining Competitors from Large Unstructured Datasets

Mining Competitors from Large Unstructured Datasets

ABSTRACT:

In any competitive business, success is based on the ability to make an item more appealing to customers than the competition. A number of questions arise in the context of this task: how do we formalize and quantify the competitiveness between two items? Who are the main competitors of a given item? What are the features of an item that most affect its competitiveness? Despite the impact and relevance of this problem to many domains, only a limited amount of work has been devoted toward an effective solution. In this paper, we present a formal definition of the competitiveness between two items, based on the market segments that they can both cover. Our evaluation of competitiveness utilizes customer reviews, an abundant source of information that is available in a wide range of domains. We present efficient methods for evaluating competitiveness in large review datasets and address the natural problem of finding the top-k competitors of a given item. Finally, we evaluate the quality of our results and the scalability of our approach using multiple datasets from different domains.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • The management literature is rich with works that focus on how managers can manually identify competitors. Some of these works model competitor identification as a mental categorization process in which managers develop mental representations of competitors and use them to classify candidate firms. Other manual categorization methods are based on market- and resource-based similarities between a firm and candidate competitors.
  • Zheng et al. identify key competitive measures (e.g. market share, share of wallet) and showed how a firm can infer the values of these measures for its competitors by mining (i) its own detailed customer transaction data and (ii) aggregate data for each competitor.

DISADVANTAGES OF EXISTING SYSTEM:

  • The frequency of textual comparative evidence can vary greatly across domains. For example, when comparing brand names at the firm level (e.g. “Google vs Yahoo” or “Sony vs Panasonic”), it is indeed likely that comparative patterns can be found by simply querying the web. However, it is easy to identify mainstream domains where such evidence is extremely scarce, such as shoes, jewelery, hotels, restaurants, and furniture.
  • Existing approach is not appropriate for evaluating the competitiveness between any two items or firms in a given market. Instead, the authors assume that the set of competitors is given and, thus, their goal is to compute the value of the chosen measures for each competitor. In addition, the dependency on transactional data is a limitation we do not have.
  • The applicability of such approaches is greatly limited

PROPOSED SYSTEM:

  • We propose a new formalization of the competitiveness between two items, based on the market segments that they can both cover.
  • We describe a method for computing all the segments in a given market based on mining large review datasets. This method allows us to operationalize our definition of competitiveness and address the problem of finding the top-k competitors of an item in any given market. As we show in our work, this problem presents significant computational challenges, especially in the presence of large datasets with hundreds or thousands of items, such as those that are often found in mainstream domains. We address these challenges via a highly scalable framework for top-k computation, including an efficient evaluation algorithm and an appropriate index.

ADVANTAGES OF PROPOSED SYSTEM:

  • To the best of our knowledge, our work is the first to address the evaluation of competitiveness via the analysis of large unstructured datasets, without the need for direct comparative evidence.
  • A formal definition of the competitiveness between two items, based on their appeal to the various customer segments in their market. Our approach overcomes the reliance of previous work on scarce comparative evidence mined from text.
  • A formal methodology for the identification of the different types of customers in a given market, as well as for the estimation of the percentage of customers that belong to each type.
  • A highly scalable framework for finding the top-k competitors of a given item in very large datasets.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

George Valkanas, Theodoros Lappas, and Dimitrios Gunopulos, “Mining Competitors from Large Unstructured Datasets”, IEEE Transactions on Knowledge and Data Engineering, 2017.

l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items

l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items

ABSTRACT:

We develop a novel framework, named as l-injection, to address the sparsity problem of recommender systems. By carefully injecting low values to a selected set of unrated user-item pairs in a user-item matrix, we demonstrate that top-N recommendation accuracies of various collaborative filtering (CF) techniques can be significantly and consistently improved. We first adopt the notion of pre-use preferences of users toward a vast amount of unrated items. Using this notion, we identify uninteresting items that have not been rated yet but are likely to receive low ratings from users, and selectively impute them as low values. As our proposed approach is method-agnostic, it can be easily applied to a variety of CF algorithms. Through comprehensive experiments with three real-life datasets (e.g., Movielens, Ciao, and Watcha), we demonstrate that our solution consistently and universally enhances the accuracies of existing CF algorithms (e.g., item-based CF, SVD-based CF, and SVD++) by 2.5 to 5 times on average. Furthermore, our solution improves the running time of those CF methods by 1.2 to 2.3 times when its setting produces the best accuracy. The datasets and codes that we used in the experiments are available at: https://goo.gl/KUrmip.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Among existing solutions in recommender systems RS, in particular, collaborative filtering (CF) methods have been shown to be widely effective. Based on the past behavior of users such as explicit user ratings and implicit click logs, CF methods exploit the similarities between users’ behavior patterns.
  • Most CF methods, despite their wide adoption in practice, suffer from low accuracy if most users rate only a few items (thus producing a very sparse rating matrix), called the data sparsity problem. This is because the number of unrated items is significantly more than that of rated items.
  • To address this problem, some existing work attempted to infer users’ ratings on unrated items based on additional information such as clicks and bookmarks

DISADVANTAGES OF EXISTING SYSTEM:

  • These works require an overhead of collecting extra data, which itself may have another data sparsity problem.
  • 0-injection simply considers all uninteresting items as zero, it may neglect to the characteristics of users or items. In contrast, l-injection not only maximizes the impact of filling missing ratings but also considers the characteristics of users and items, by imputing uninteresting items with low peruse preferences.

PROPOSED SYSTEM:

  • In this work, we develop a more general l-injection to infer different user preferences for uninteresting items for users, and show that l-injection mostly outperforms 0-injection.
  • The proposed l-injection approach can improve the accuracy of top-N recommendation based on two strategies: (1) preventing uninteresting items from being included in the top-N recommendation, and (2) exploiting both uninteresting and rated items to predict the relative preferences of unrated items more accurately.
  • With the first strategy, because users are aware of the existence of uninteresting items but do not like them, such uninteresting items are likely to be false positives if included in top-N recommendation. Therefore, it is effective to exclude uninteresting items from top-N recommendation results.
  • Next, the second strategy can be interpreted using the concept of typical memory based CF methods.

ADVANTAGES OF PROPOSED SYSTEM:

  • We introduce a new notion of uninteresting items, and classify user preferences into pre-use and post-use preferences to identify uninteresting items.
  • We propose to identify uninteresting items via peruse preferences by solving the OCCF problem and show its implications and effectiveness.
  • We propose low-value injection (called l-injection) to improve the accuracy of top-N recommendation in existing CF algorithms.
  • While existing CF methods only employ user preferences on rated items, the proposed approach employs both peruse and post-use preferences. Specifically, the proposed approach first infers pre-use preferences of unrated items and identifies uninteresting items.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Jongwuk Lee Won-Seok Hwang Juan Parc Youngnam Lee Sang-Wook Kim Dongwon Lee, “l-Injection: Toward Effective Collaborative Filtering Using Uninteresting Items”, IEEE Transactions on Knowledge and Data Engineering, 2017

Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach

Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach

ABSTRACT:

As both social network structure and strength of influence between individuals evolve constantly, it requires to track the influential nodes under a dynamic setting. To address this problem, we explore the Influential Node Tracking (INT) problem as an extension to the traditional Influence Maximization problem (IM) under dynamic social networks. While Influence Maximization problem aims at identifying a set of k nodes to maximize the joint influence under one static network, INT problem focuses on tracking a set of influential nodes that keeps maximizing the influence as the network evolves. Utilizing the smoothness of the evolution of the network structure, we propose an efficient algorithm, Upper Bound Interchange Greedy (UBI) and a variant, UBI+. Instead of constructing the seed set from the ground, we start from the influential seed set we find previously and implement node replacement to improve the influence coverage. Furthermore, by using a fast update method by calculating the marginal gain of nodes, our algorithm can scale to dynamic social networks with millions of nodes. Empirical experiments on three real large-scale dynamic social networks show that our UBI and its variants, UBI+ achieves better performance in terms of both influence coverage and running time.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Zhuang et al. study the influence maximization under dynamic networks where the changes can be only detected by periodically probing some nodes. Their goal then is to probe a subset of nodes in a social network so that the actual influence diffusion process in the network can be best uncovered with the probing nodes.
  • Zhou et al. have achieved further acceleration by incorporating upper bound on the influence function.

DISADVANTAGES OF EXISTING SYSTEM:

  • Traditional algorithms for Influence Maximization become inefficient under this situation as they fail to consider the connection between social networks at different time and have to solve many Influence Maximization problems independently for social network at each time.
  • All the previous methods aim to discover the influential nodes under one static network.

PROPOSED SYSTEM:

  • In this paper, we propose an efficient algorithm, Upper Bound Interchange Greedy (UBI), to tackle Influence Maximization problem under dynamic social network, which we term as Influential Node Tracking (INT) problem. That is to track a set of influential nodes which maximize the influence under the social network at any time.
  • The main idea of our UBI algorithm is to leverage the similarity of social networks near in time and directly discover the influential nodes based on the seed set found for previous social network instead of constructing the solution from an empty set. As similarity in network structure leads to similar set of nodes that maximize the influence.
  • In our UBI algorithm, we start from the seed set maximizing the influence under previous social network. Then we change the nodes in the existing set one by one in order to increase the influence under the current social network. As the optimal seed set differs only in a small number of nodes, a few rounds of node exchanges are enough to discover a seed set with large joint influence under current social network.

ADVANTAGES OF PROPOSED SYSTEM:

  • We explore the Influential Node Tracking (INT) problem as an extension to the traditional Influence Maximization problem to maximize the influence coverage under a dynamic social network.
  • We propose an efficient algorithm, Upper Bound Interchange (UBI) to solve the INT problem. Our algorithm achieves comparable results as hill-climbing greedy algorithm approximation is guaranteed. The algorithm has the time complexity of O(kn), and the space complexity of O(n), where n is the number of nodes and k is the size of the seed set.
  • We propose an algorithm UBI+, based on UBI, that improves the computation of node replacement upper bound.
  • We evaluate the performance on large-scale real social network. The experiment results confirm our theoretical findings and show that our UBI and UBI+ algorithm achieve better performance of both influence coverage and running time.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Guojie Song, Yuanhao Li, Xiaodong Chen, and Xinran He, “Influential Node Tracking on Dynamic Social Network: An Interchange Greedy Approach”, IEEE Transactions on Knowledge and Data Engineering, 2017.