ViDE: A Vision-Based Approach for Deep Web Data Extraction

ViDE: A Vision-Based Approach for Deep Web Data Extraction

ABSTRACT:

Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages (they will be called deep Web pages in this paper). Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. Until now, a large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language dependent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper, a novel vision-based approach that is Web-page programming- language-independent is proposed. This approach primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction. We also propose a new evaluation measure revision to capture the amount of human effort needed to produce perfect extraction. Our experiments on a large set of Web databases show that the proposed vision-based approach is highly effective for deep Web data extraction.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

Web-page-programming-language dependent, or more precisely, HTML-dependent. Web pages are written in HTML, it is not surprising that all previous solutions are based on analyzing the HTML source code of Web pages, Static web page programming. The earliest approaches are the manual approaches in which languages were designed to assist programmer in constructing wrappers to identify and extract all the desired data items/fields. Semiautomatic techniques can be classified into sequence based and tree-based. Parses the document into a hierarchical tree (DOM tree), based on which they perform the extraction process. These approaches require manual efforts.

PROPOSED SYSTEM:

A novel technique is proposed to perform data extraction from deep Web pages using primarily visual features. We open a promising research direction where the visual features are utilized to extract deep Web data automatically. 2) A new performance measure, revision, is proposed to evaluate Web data extraction tools. This measure reflects how likely a tool will fail to generate a perfect wrapper for a site. 3) A large data set consisting of 1,000 Web databases across 42 domains is used in our experimental study. In contrast, the data sets used in previous works seldom had more than 100 Web databases. Wrappers that can improve the efficiency of both data record extraction and data item extraction. Highly accurate experimental results provide strong evidence that rich visual features on deep Web pages can be used as the basis to design highly effective data extraction algorithms.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

 

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Wei Liu, Xiaofeng Meng, Member, IEEE, and Weiyi Meng, Member, IEEE, “ViDE: A Vision-Based Approach for Deep Web Data Extraction”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 3, MARCH 2010.

An Efficient Distributed Trust Model for Wireless Sensor Networks in Java

An Efficient Distributed Trust Model for Wireless Sensor Networks

ABSTRACT:

Trust models have been recently suggested as an effective security mechanism for Wireless Sensor Networks (WSNs). Considerable research has been done on modeling trust. However, most current research work only takes communication behavior into account to calculate sensor nodes’ trust value, which is not enough for trust evaluation due to the widespread malicious attacks. In this paper, we propose an Efficient Distributed Trust Model (EDTM) for WSNs. First, according to the number of packets received by sensor nodes, direct trust and recommendation trust are selectively calculated. Then, communication trust, energy trust and data trust are considered during the calculation of direct trust. Furthermore, trust reliability and familiarity are defined to improve the accuracy of recommendation trust. The proposed EDTM can evaluate trustworthiness of sensor nodes more precisely and prevent the security breaches more effectively. Simulation results show that EDTM outperforms other similar models, e.g., NBBTE trust model.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Various security mechanisms, e.g., cryptography, authentication, confidentiality, and message integrity, have been proposed to avoid security threats such as eavesdropping, message replay, and fabrication of messages.
  • However, these approaches still suffer from many security vulnerabilities, such as node capture attacks and denial-of-service (DoS) attacks.
  • The traditional security mechanisms can resist external attacks, but cannot solve internal attacks effectively which are caused by the captured nodes.
  • To establish secure communications, we need to ensure that all communicating nodes are trusted. Most existing studies only provide the trust assessment for neighbor nodes.
  • However, in real applications, a sensor node sometimes needs to obtain the trust value of the non-neighbor nodes.

DISADVANTAGES OF EXISTING SYSTEM:

  • There may occur DoS attacks
  • Cannot solve the internal attacks
  • Trust assessment only provide for neighbor nodes
  • Do not solve the trust dynamic problem

PROPOSED SYSTEM:

  • In this project, we propose an efficient distributed trust model (EDTM). The proposed EDTM can evaluate the trust relationships between sensor nodes more precisely and can prevent security breaches more effectively.
  • This project is a multi-hop network which means that the sensor nodes can only directly communication with the neighbor nodes within their communication range.
  • The packets exchanged between any two non-neighbor nodes are forwarded by other nodes.
  • The forwarding node not only can just “pass” the packets from source nodes to destination nodes but also can process the information based on their own judgments.
  • Generally, the trust value is calculated based on a subject’s observation on the object and recommendations from a third party.
  • The third party who provides recommendations is a recommender.

ADVANTAGES OF PROPOSED SYSTEM:

  • It can prevent security breaches more effectively
  • Provide more security
  • Trusted key exchange
  • Increase the packet delivery ratio

SYSTEM ARCHITECTURE

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

 

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows XP/7.
  • Coding Language : JAVA
  • IDE : NETBEANS

 

REFERENCE:

Jinfang Jiang, Guangjie Han, Feng Wang, Lei Shu, Member, IEEE, and Mohsen Guizani, Fellow, IEEE, “An Efficient Distributed Trust Model for Wireless Sensor Networks”, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 5, MAY 2015

Throughput Maximization in Cognitive Radio Networks using Levenberg-Marquardt Algorithm

Throughput Maximization in Cognitive Radio Networks using Levenberg-Marquardt Algorithm

ABSTRACT:

Cognitive radio network is the promising technology of the next generation communication networks which enables the secondary users (SUs) to use the free spectrum bands which are licensed originally to the primary users (PUs) without causing interference and utilize the spectrum more efficiently. Spectrum sensing should be carried out frequently in order to transmit the data successfully through SU without causing significant interference with the PU and to achieve the maximum throughput. In this paper we propose an artificial neural network model which is known as Levenberg-Marquardt (L-M) algorithm for predicting the propagation. The simulation results show that there is a significant gain in the throughput when compared with both random and HMM methods and it explains the superiority of the algorithm. Also, it is found that the L-M algorithm is faster than the other algorithms.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Pavithra Roy P, Dr. Muralidhar M, “Throughput Maximization in Cognitive Radio Networks using Levenberg-Marquardt Algorithm”, International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 2, February 2015.

SmartCrawler: A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces

SmartCrawler: A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces

ABSTRACT:

As deep web grows at a very fast pace, there has been increased interest in techniques that help efficiently locate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. We propose a two-stage framework, namely SmartCrawler, for efficient harvesting deep web interfaces. In the first stage, SmartCrawler performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl, SmartCrawler ranks websites to prioritize highly relevant ones for a given topic. In the second stage, SmartCrawler achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking. To eliminate bias on visiting some highly relevant links in hidden web directories, we design a link tree data structure to achieve wider coverage for a website. Our experimental results on a set of representative domains show the agility and accuracy of our proposed crawler framework, which efficiently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers.

PURPOSE

The main purpose of the project Web Server Performance Testing Tool is to measure the performance of the web server or server-side application.

SCOPE

The Scope of the Web server Performance Testing Tool is as follows:

  • Act as basic HTTP client
  • Send multiple requests to a given URL(s)
  • Log the Request/Response time and any details for each URL

EXISTING SYSTEM:

  • In the Existing system there is no performance testing tool which measures the maximum requests per second that normally a web server can handle. And there is no proper recognition to prevent the requests per second from going higher such as backend dependencies and CPU memory especially. In this System there is sometime bottlenecks which cannot be easily solved.
  • The previous work has proposed two types of crawlers, generic crawlers and focused crawlers.
  • Generic crawlers fetch all searchable forms and cannot focus on a specific topic. Focused crawlers such as Form-Focused Crawler (FFC) and Adaptive Crawler for Hidden-web Entries (ACHE) can automatically search online databases on a specific topic. FFC is designed with link, page, and form classifiers for focused crawling of web forms, and is extended by ACHE with additional components for form filtering and adaptive link learner. The link classifiers in these crawlers play a pivotal role in achieving higher crawling efficiency than the best-first crawler. However, these link classifiers are used to predict the distance to the page containing searchable forms, which is difficult to estimate, especially for the delayed benefit links (links eventually lead to pages with forms). As a result, the crawler can be inefficiently led to pages without targeted forms.

DISADVANTAGES OF EXISTING SYSTEM:

  • It is challenging to locate the deep web databases, because they are not registered with any search engines, are usually sparsely distributed, and keep constantly changing.
  • The set of retrieved forms is very heterogeneous.
  • It is crucial to develop smart crawling strategies that are able to quickly discover relevant content sources from the deep web as much as possible.

PROPOSED SYSTEM:

  • In the proposed system the first task in Performance testing is to use a tool to apply stress to the web site and measure the maximum requests per second that the web server can handle. This is a quantitative measurement. The second task is to determine which resource prevents the requests per second from going higher, such as CPU, Memory or backend dependencies.  This second task is more of an art than a measurement.
  • In many situations, the web server processor is the bottleneck. Increase the stress to the point where the requests per second start to decrease, then back the stress off slightly. This is the maximum performance that the web site can achieve. Increasing the number of client machines will also produce a greater stress level.
  • The WSPT Tool is an application developed in Java to measure the performance of Web Server or Server-side application. Parameters used for measurement are
    • Request – Response Time
    • Number of Requests successfully attended to by a Web Server
  • The system proposes a two-stage framework, namely Smart Crawler, for efficient harvesting deep web interfaces. In the first stage, Smart Crawler performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl, Smart Crawler ranks websites to prioritize highly relevant ones for a given topic. In the second stage, Smart Crawler achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking. To eliminate bias on visiting some highly relevant links in hidden web directories, we design a link tree data structure to achieve wider coverage for a website.

ADVANTAGES OF PROPOSED SYSTEM:

  • Following are the Advantages of using Web server Performance Testing Tool when developing and running Websites is important for web infrastructure:
    • Maximize Uptime: Resolve performance critical issues in webserver before they bring down website.
    • Maximize Performance: Make sure that websites and applications are given the server resources they need when they need it to guarantee a high quality user experience.
    • Maximize ROI: Get everything out of the investment in webserver technology through consistent and in-depth testing and analysis.
    • Maximize Value: Webserver Performance Testing Tool is the most cost-effective solution for simulating performance, and stress tests for web server.
  • Our experimental results on a set of representative domains show the agility and accuracy of our proposed crawler framework, which efficiently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

 

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Eclipse
  • Database : MYSQL

REFERENCE:

Feng Zhao, Jingyu Zhou, Chang Nie, Heqing Huang, Hai Jin, “SmartCrawler: A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces” IEEE Transactions on Services Computing 2015.

CoCoWa: A Collaborative Contact-Based Watchdog for Detecting Selfish Nodes

CoCoWa: A Collaborative Contact-Based Watchdog for Detecting Selfish Nodes

ABSTRACT:

Mobile ad-hoc networks (MANETs) assume that mobile nodes voluntary cooperate in order to work properly. This cooperation is a cost-intensive activity and some nodes can refuse to cooperate, leading to selfish node behaviour. Thus, the overall network performance could be seriously affected. The use of watchdogs is a well-known mechanism to detect selfish nodes. However, the detection process performed by watchdogs can fail, generating false positives and false negatives that can induce to wrong operations. Moreover, relying on local watchdogs alone can lead to poor performance when detecting selfish nodes, in term of precision and speed. This is specially important on networks with sporadic contacts, such as delay tolerant networks (DTNs), where sometimes watchdogs lack of enough time or information to detect the selfish nodes. Thus, we propose collaborative contact-based watchdog (CoCoWa) as a collaborative approach based on the diffusion of local selfish nodes awareness when a contact occurs, so that information about selfish nodes is quickly propagated. As shown in the paper, this collaborative approach reduces the time and increases the precision when detecting selfish nodes.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

The impact of node selfishness on MANETs has been studied in credit-payment scheme. In credit-payment scheme it is shown that when no selfishness prevention mechanism is present, the packet delivery rates become seriously degraded, from a rate of 80 percent when the selfish node ratio is 0, to 30 percent when the selfish node ratio is 50 percent. The number of packet losses is increased by 500 percent when the selfish node ratio increases from 0 to 40 percent. A more detailed study shows that a moderate concentration of node selfishness (starting from a 20 percent level) has a huge impact on the overall performance of MANETs, such as the average hop count, the number of packets dropped, the offered throughput, and the probability of reachability. In DTNs, selfish nodes can seriously degrade the performance of packet transmission. For example, in two-hop relay schemes, if a packet is transmitted to a selfish node, the packet is not re-transmitted, therefore being lost.

DISADVANTAGES OF EXISTING SYSTEM:

  • Increase the selfish nodes
  • Increase the packet loss
  • Reduce the throughput
  • Increase overhead
  • In DTNs, selfish nodes can seriously degrade the performance of packet transmission. For example, in two-hop relay schemes, if a packet is transmitted to a selfish node, the packet is not re-transmitted, therefore being lost.

PROPOSED SYSTEM:

  • This project introduces Collaborative Contact-based Watchdog (CoCoWa) as a new scheme for detecting selfish nodes that combines local watchdog detections and the dissemination of this information on the network. If one node has previously detected a selfish node it can transmit this information to other nodes when a contact occurs. This way, nodes have second hand information about the selfish nodes in the network.
  • The goal of our approach is to reduce the detection time and to improve the precision by reducing the effect of both false negatives and false positives. In general, the analytical evaluation shows a significant reduction of the detection time of selfish nodes with a reduced overhead when comparing CoCoWa against a traditional watchdog.
  • The impact of false negatives and false positives is also greatly reduced. Finally, the pernicious effect of malicious nodes can be reduced using the reputation detection scheme. We also evaluate CoCoWa with real mobility scenarios using well known human and vehicular mobility traces.

ADVANTAGES OF PROPOSED SYSTEM:

  • Reduce the selfish nodes
  • Increase the throughput
  • Decrease the overhead 

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows XP/7.
  • Coding Language : JAVA
  • IDE : Eclipse Keepler

REFERENCE:

Enrique Hern_andez-Orallo, Member, IEEE, Manuel David Serrat Olmos, Juan-Carlos Cano, Carlos T. Calafate, and Pietro Manzoni, Member, IEEE, “CoCoWa: A Collaborative Contact-Based Watchdog for Detecting Selfish Nodes”, IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 6, JUNE 2015.

Discovering Latent Semantics in Web Documents using Fuzzy Clustering

Discovering Latent Semantics in Web Documents using Fuzzy Clustering

ABSTRACT:

Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of co-occurring features organize a hierarchy of connected semantic complexes called ‘CONCEPTS,’ wherein a fuzzy linguistic measure is applied on each complex to evaluate (1) the relevance of a document belonging to a topic, and (2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based or collaborative information filtering, and so forth.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • Fuzzy c-means and fuzzy hierarchical clustering algorithms were deployed for document clustering. Fuzzy c-means and fuzzy hierarchical clustering need prior knowledge about ‘number of clusters’ and ‘initial cluster centroids,’ which are considered as serious drawbacks of these approaches. To address these drawbacks, ant-based fuzzy clustering algorithms and fuzzy k-means clustering algorithms were proposed that can deal with unknown number of clusters. Moreover, the similarity measures and bag of words were the main limitation of those methods to capture the semantics in the collection of documents.
  • Based on Vector Space Model, the similarity between two documents is measured with vector distance, such as Euclidean distance, Manhattan distance, and so on. These methods do not take contextual meaning into consideration. Ontology-based fuzzy document clustering schemes were used to cluster documents with a limited subsets of selected terms based on the defined ontology. Those methods restrict the application domains, which makes them difficult to be generalized if the domain does not have a proper ontology.

DISADVANTAGES OF EXISTING SYSTEM:

  • Search engines are indispensable tools to find, filter, and extract the desired information, which attempt to aid users in gathering relevant contents from web.
  • The complex and high interactions between terms in documents demonstrates vague and ambiguous meanings. Polysemies, synonyms, homonyms, phrases, dependencies, and spams limit the capabilities of search technologies and strongly diminish the comprehensiveness of the results returned from the search engines

PROPOSED SYSTEM:

  • This paper addresses a novel clustering algorithm to discover the latent semantics in a text corpus from a fuzzy linguistic perspective. Besides the applicability in text domains, it can be extended to the applications, such as data mining, bioinformatics, content-based or collaborative information filtering, and so forth.
  • The proposed System extracts features from the web documents using semi-supervised learning schemes called named entitiesand builds a fuzzy linguistic topological space based on the associations of features. The associations of co-occurring features organize a hierarchy of connected semantic complexes called ‘CONCEPTS,’ wherein a fuzzy linguistic measure is applied on each complex to evaluate (1) the relevance of a document belonging to a topic, and (2) the difference between the other topics. The general framework of our clustering method consists of two phases. The first phase, feature extraction, is to extract key named entities from a collection of “indexed” documents; the second phrase, fuzzy clustering, is to determine relations between features and identify their linguistic categories.

ADVANTAGES OF PROPOSED SYSTEM:

  • We can effectively discover such a maximal fuzzy simplexes and use them to cluster the collection of web documents. Based on our web site and our experiments, we find that FLSC is a very good way to organize the unstructured and semi-structured data into several semantic topics.
  • It also illustrates that geometric complexes are an effective model for automatic web documents clustering.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

I-Jen Chiang, Member, IEEE, Charles Chih-Ho Liu, Yi-Hsin Tsai, and Ajit Kumar, “Discovering Latent Semantics in Web Documents using Fuzzy Clustering”, IEEE Transactions on Fuzzy Systems 2015.

Privacy-Preserving Detection of Sensitive Data Exposure

Privacy-Preserving Detection of Sensitive Data Exposure

ABSTRACT:

Statistics from security firms, research institutions and government organizations show that the number of data-leak instances have grown rapidly in recent years. Among various data-leak cases, human mistakes are one of the main causes of data loss. There exist solutions detecting inadvertent sensitive data leaks caused by human mistakes and to provide alerts for organizations. A common approach is to screen content in storage and transmission for exposed sensitive information. Such an approach usually requires the detection operation to be conducted in secrecy. However, this secrecy requirement is challenging to satisfy in practice, as detection servers may be compromised or outsourced. In this paper, we present a privacy-preserving data-leak detection (DLD) solution to solve the issue where a special set of sensitive data digests is used in detection. The advantage of our method is that it enables the data owner to safely delegate the detection operation to a semihonest provider without revealing the sensitive data to the provider. We describe how Internet service providers can offer their customers DLD as an add-on service with strong privacy guarantees. The evaluation results show that our method can support accurate detection with very small number of false alarms under various data-leak scenarios.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool :         Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Xiaokui Shu, Danfeng Yao, Member, IEEE, and Elisa Bertino, Fellow, IEEE, “Privacy-Preserving Detection of Sensitive Data Exposure”, IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 5, MAY 2015.

Improving Privacy and Security in Decentralized Ciphertext-Policy Attribute-Based Encryption

Improving Privacy and Security in Decentralized Ciphertext-Policy Attribute-Based Encryption

ABSTRACT:

In previous privacy-preserving multi-authority attribute-based encryption (PPMA-ABE) schemes, a user can acquire secret keys from multiple authorities with them knowing his/her attributes and furthermore, a central authority is required. Notably, a user’s identity information can be extracted from his/her some sensitive attributes. Hence, existing PPMA-ABE schemes cannot fully protect users’ privacy as multiple authorities can collaborate to identify a user by collecting and analyzing his attributes. Moreover, ciphertext-policy ABE (CP-ABE) is a more efficient public-key encryption, where the encryptor can select flexible access structures to encrypt messages. Therefore, a challenging and important work is to construct a PPMA-ABE scheme where there is no necessity of having the central authority and furthermore, both the identifiers and the attributes can be protected to be known by the authorities. In this paper, a privacy-preserving decentralized CP-ABE (PPDCP-ABE) is proposed to reduce the trust on the central authority and protect users’ privacy. In our PPDCP-ABE scheme, each authority can work independently without any collaboration to initial the system and issue secret keys to users. Furthermore, a user can obtain secret keys from multiple authorities without them knowing anything about his global identifier and attributes.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Jinguang Han, Member, IEEE, Willy Susilo, Senior Member, IEEE, Yi Mu, Senior Member, IEEE, Jianying Zhou, and Man Ho Allen Au, Member, IEEE, “Improving Privacy and Security in Decentralized Ciphertext-Policy Attribute-Based Encryption”, IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015.

Detection of Denial-of-Service Attacks Based on Computer Vision Techniques

Detection of Denial-of-Service Attacks Based on Computer Vision Techniques

ABSTRACT:

Detection of Denial-of-Service (DoS) attacks has attracted researchers since 1990s. A variety of detection systems has been proposed to achieve this task. Unlike the existing approaches based on machine learning and statistical analysis, the proposed system treats traffic records as images and detection of DoS attacks as a computer vision problem. A multivariate correlation analysis approach is introduced to accurately depict network traffic records and to convert the records into their respective images. The images of network traffic records are used as the observed objects of our proposed DoS attack detection system, which is developed based on a widely used dissimilarity measure, namely Earth Mover’s Distance (EMD). EMD takes cross-bin matching into account and provides a more accurate evaluation on the dissimilarity between distributions than some other well-known dissimilarity measures, such as Minkowski-form distance Lp and X2 statistics. These unique merits facilitate our proposed system with effective detection capabilities. To evaluate the proposed EMD-based detection system, ten-fold cross-validations are conducted using KDD Cup 99 dataset and ISCX 2012 IDS Evaluation dataset. The results presented in the system evaluation section illustrate that our detection system can detect unknown DoS attacks and achieves 99.95 percent detection accuracy on KDD Cup 99 dataset and 90.12 percent detection accuracy on ISCX 2012 IDS evaluation dataset with processing capability of approximately 59,000 traffic records per second.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Zhiyuan Tan, Member, IEEE, Aruna Jamdagni, Xiangjian He, Senior Member, IEEE, Priyadarsi Nanda, Senior Member, IEEE, Ren Ping Liu, Senior Member, IEEE, and Jiankun Hu, Member, IEEE, “Detection of Denial-of-Service Attacks Based on Computer Vision Techniques”, IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 9, SEPTEMBER 2015.

Generating Searchable Public-Key Ciphertexts with Hidden Structures

Generating Searchable Public-Key Ciphertexts with Hidden Structures for Fast Keyword Search

ABSTRACT:

Existing semantically secure public-key searchable encryption schemes take search time linear with the total number of the ciphertexts. This makes retrieval from large-scale databases prohibitive. To alleviate this problem, this paper proposes Searchable Public-Key Ciphertexts with Hidden Structures (SPCHS) for keyword search as fast as possible without sacrificing semantic security of the encrypted keywords. In SPCHS, all keyword-searchable ciphertexts are structured by hidden relations, and with the search trapdoor corresponding to a keyword, the minimum information of the relations is disclosed to a search algorithm as the guidance to find all matching ciphertexts efficiently. We construct a SPCHS scheme from scratch in which the ciphertexts have a hidden star-like structure. We prove our scheme to be semantically secure in the Random Oracle (RO) model. The search complexity of our scheme is dependent on the actual number of the ciphertexts containing the queried keyword, rather than the number of all ciphertexts. Finally, we present a generic SPCHS construction from anonymous identity-based encryption and collision-free full-identity malleable Identity-Based Key Encapsulation Mechanism (IBKEM) with anonymity. We illustrate two collision-free full-identity malleable IBKEM instances, which are semantically secure and anonymous, respectively, in the RO and standard models. The latter instance enables us to construct an SPCHS scheme with semantic security in the standard model.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

  • One of the prominent works to accelerate the search over encrypted keywords in the public-key setting is deterministic encryption introduced by Bellare et al.
  • An encryption scheme is deterministic if the encryption algorithm is deterministic. Bellare et al. focus on enabling search over encrypted keywords to be as efficient as the search for unencrypted keywords, such that a ciphertext containing a given keyword can be retrieved in time complexity logarithmic in the total number of all ciphertexts. This is reasonable because the encrypted keywords can form a tree-like structure when stored according to their binary values.
  • Search on encrypted data has been extensively investigated in recent years. From a cryptographic perspective, the existing works fall into two categories, i.e., symmetric searchable encryption and public-key searchable encryption.

DISADVANTAGES OF EXISTING SYSTEM:

  • Existing semantically secure PEKS schemes take search time linear with the total number of all cipher texts. This makes retrieval from large-scale databases prohibitive. Therefore, more efficient search performance is crucial for practically deploying PEKS schemes.
  • Deterministic encryption has two inherent limitations. First, keyword privacy can be guaranteed only for keywords that are a priori hard to guess by the adversary (i.e., keywords with high min-entropy to the adversary); second, certain information of a message leaks inevitably via the ciphertext of the keywords since the encryption is deterministic. Hence, deterministic encryption is only applicable in special scenarios.
  • The linear search complexity of existing schemes is the major obstacle to their adoption.

PROPOSED SYSTEM:

  • We are interested in providing highly efficient search performance without sacrificing semantic security in PEKS.
  • We start by formally defining the concept of Searchable Public-key Ciphertexts with Hidden Structures (SPCHS) and its semantic security.
  • In this new concept, keyword searchable ciphertexts with their hidden structures can be generated in the public key setting; with a keyword search trapdoor, partial relations can be disclosed to guide the discovery of all matching ciphertexts.
  • Semantic security is defined for both the keywords and the hidden structures. It is worth noting that this new concept and its semantic security are suitable for keyword-searchable ciphertexts with any kind of hidden structures. In contrast, the concept of traditional PEKS does not contain any hidden structure among the PEKS ciphertexts; correspondingly, its semantic security is only defined for the keywords.

ADVANTAGES OF PROPOSED SYSTEM:

  • We build a generic SPCHS construction with Identity-Based Encryption (IBE) and collision-free full-identity malleable IBKEM.
  • The resulting SPCHS can generate keyword-searchable ciphertexts with a hidden star-like structure. Moreover, if both the underlying IBKEM and IBE have semantic security and anonymity (i.e. the privacy of receivers’ identities), the resulting SPCHS is semantically secure.

 

SYSTEM ARCHITECTURE:

MODULE:

  1. Data owner Module
  2. Data User Module
  3. Encryption Module
  4. Rank Search Module

MODULE DESCRIPTION: 

Data owner Module

Searchable Public-Key Ciphertexts with Hidden Structures (SPCHS) for keyword search as fast as possible without sacrificing semantic security of the encrypted keywords. In SPCHS, all keyword-searchable ciphertexts are structured by hidden relations, and with the search trapdoor corresponding to a keyword, the minimum information of the relations is disclosed to a search algorithm as the guidance to find all matching ciphertexts efficiently

 

Data User Module

In this module, we develop the data user module. It start by formally defining the concept of Searchable Public-key Ciphertexts with Hidden Structures (SPCHS) and its semantic security. In this new concept, keyword searchable cipher texts with their hidden structures can be generated in the public key setting; with a keyword search trapdoor, partial relations can be disclosed to guide the discovery of all matching cipher texts. 

Encryption Module

Anonymous identity-based broadcast encryption. A slightly more complicated application is anonymous identity-based broadcast encryption with efficient decryption. An analogous application was proposed respectively by Barth et al.  and Libert et al.  in the traditional public-key setting. With collision-free fullidentity malleable IBKEM, a sender generates an identity based broadcast ciphertext hC1, C2, (K1 1 jjSE(K1 2 ; F1)), :::, (KN 1 jjSE(KN 2 ; FN))i, where C1 and C2 are two IBKEM  encapsulations,

Rank Search Module

It allows the search to be processed in logarithmic time, although the keyword search trapdoor has length linear with the size of the database. In addition to the above efforts devoted to either provable security or better search performance

 

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS: 

  • System :         Pentium IV 2.4 GHz.
  • Hard Disk :         40 GB.
  • Floppy Drive : 44 Mb.
  • Monitor : 15 VGA Colour.
  • Mouse :
  • Ram : 512 Mb.

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows XP/7.
  • Coding Language : JAVA/J2EE
  • IDE : Netbeans 7.4
  • Database : MYSQL

REFERENCE:

Peng Xu, Member, IEEE, Qianhong Wu, Member, IEEE, Wei Wang, Member, IEEE, Willy Susilo_, Senior Member, IEEE, Josep Domingo-Ferrer, Fellow, IEEE, Hai Jin, Senior Member, IEEE, “Generating Searchable Public-Key Ciphertexts with Hidden Structures for Fast Keyword Search”, IEEE Transactions on Information Forensics and Security 2015.