Predicting Cyberbullying on Social Media in the Big Data Era Using Machine Learning Algorithms: Review of Literature and Open Challenges

Predicting Cyberbullying on Social Media in the Big Data Era Using Machine Learning Algorithms: Review of Literature and Open Challenges

ABSTRACT:

Prior to the innovation of information communication technologies (ICT), social interactions evolved within small cultural boundaries such as geo spatial locations. The recent developments of communication technologies have considerably transcended the temporal and spatial limitations of traditional communications. These social technologies have created a revolution in user-generated information, online human networks, and rich human behavior-related data. However, the misuse of social technologies such as social media (SM) platforms, has introduced a new form of aggression and violence that occurs exclusively online. A new means of demonstrating aggressive behavior in SM websites are highlighted in this paper. The motivations for the construction of prediction models to fight aggressive behavior in SM are also outlined. We comprehensively review cyberbullying prediction models and identify the main issues related to the construction of cyberbullying prediction models in SM. This paper provides insights on the overall process for cyberbullying detection and most importantly overviews the methodology. Though data collection and feature engineering process has been elaborated, yet most of the emphasis is on feature selection algorithms and then using various machine learning algorithms for prediction of cyberbullying behaviors. Finally, the issues and challenges have been highlighted as well, which present new research directions for researchers to explore.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7.
  • Coding Language :
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

MOHAMMED ALI AL-GARADI1, MOHAMMAD RASHID HUSSAIN2, NAWSHER KHAN2,GHULAM MURTAZA1,3, HENRY FRIDAY NWEKE 1, IHSAN ALI 1, GHULAM MUJTABA1,3,HARUNA CHIROMA 4, HASAN ALI KHATTAK 5, AND ABDULLAH GANI, “Predicting Cyberbullying on Social Media in theBig Data Era Using Machine Learning Algorithms:Review of Literature and Open Challenges”, IEEE Access ( Volume: 7 ), 2019.

 

Retrieving Hidden Friends: A Collusion Privacy Attack Against Online Friend Search Engine

Retrieving Hidden Friends: A Collusion Privacy Attack Against Online Friend Search Engine

ABSTRACT:

Online Social Networks (OSNs) are providing avariety of applications for human users to interact with families,friends and even strangers. One of such applications, friendsearch engine, allows the general public to query individualusers’ friend lists and has been gaining popularity recently.However, without proper design, this application may mistakenlydisclose users’ private relationship information. Ourprevious work has proposed a privacy preservation solutionthat can effectively boost OSNs’ sociability while protectingusers’ friendship privacy against attacks launched by individualmalicious requestors. In this paper, we propose an advancedcollusion attack, where a victim user’s friendship privacy canbe compromised through a series of carefully designed queriescoordinately launched by multiple malicious requstors. The effectof the proposed collusion attack is validated through syntheticand real-world social network data sets. The in-depth researchon the advanced collusion attacks will help us design a morerobust and securer friend search engine on OSNs in the nearfuture.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7.
  • Coding Language :
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Yuhong Liu and Na Li, “Retrieving Hidden Friends: A Collusion PrivacyAttack Against Online Friend Search Engine”, IEEE Transactions on Information Forensics and Security, Volume: 14 , Issue: 4 , April 2019.

Towards Achieving Keyword Search over Dynamic Encrypted Cloud Data with Symmetric-Key Based Verification

Towards Achieving Keyword Search over Dynamic Encrypted Cloud Data with Symmetric-Key Based Verification

ABSTRACT:

Verifiable Searchable Symmetric Encryption, as an important cloud security technique, allows users to retrieve the encrypted data from the cloud through keywords and verify the validity of the returned results. Dynamic update for cloud data is one of the most common and fundamental requirements for data owners in such schemes. To the best of our knowledge, the existing verifiable SSE schemes supporting data dynamic update are all based on asymmetric-key cryptography verification, which involves time-consuming operations. The overhead of verification may become a significant burden due to the sheer amount of cloud data. Therefore, how to achieve keyword search over dynamic encrypted cloud data with efficient verification is a critical unsolved problem. To address this problem, we explore achieving keyword search over dynamic encrypted cloud data with symmetric-key based verification and propose a practical scheme in this paper. In order to support the efficient verification of dynamic data, we design a novel Accumulative Authentication Tag (AAT) based on the symmetric-key cryptography to generate an authentication tag for each keyword. Benefiting from the accumulation property of our designed AAT, the authentication tag can be conveniently updated when dynamic operations on cloud data occur. In order to achieve efficient data update, we design a new secure index composed by a search table ST based on the orthogonal list and a verification list VL containing AATs. Owing to the connectivity and the flexibility of ST, the update efficiency can be significantly improved. The security analysis and the performance evaluation results show that the proposed scheme is secure and efficient.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7.
  • Coding Language :
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Xinrui Ge, Jia Yu, Hanlin Zhang, Chengyu Hu, Zengpeng Li, Zhan Qin, Rong Hao, “Towards Achieving Keyword Search over DynamicEncrypted Cloud Data with Symmetric-Key BasedVerification”, IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2019.

 

Transactional Behavior Verification in Business Process as a Service Configuration

Transactional Behavior Verification in Business Process as a Service Configuration

ABSTRACT:

Business Process as a Service (BPaaS) is an emerging type of cloud service that offers configurable and executablebusiness processes to clients over the Internet. As BPaaS is still in early years of research, many open issues remain. Managingthe configuration of BPaaS builds on areas such as software product lines and configurable business processes. The problem hasconcerns to consider from several perspectives, such as the different types of variable features, constraints between configurationoptions, and satisfying the requirements provided by the client. In our approach, we use temporal logic templates to elicittransactional requirements from clients that the configured service must adhere to. For formalizing constraints over configuration,feature models are used. To manage all these concerns during BPaaS configuration, we develop a structured process thatapplies formal methods while directing clients through specifying transactional requirements and selecting configurable features.The Binary Decision Diagram (BDD) analysis is then used to verify that the selected configurable features do not violate anyconstraints. Finally, model checking is applied to verify the configured service against the transactional requirement set. Wedemonstrate the feasibility of our approach with several validation scenarios and performance evaluations.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS: 

  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : ECLIPSE
  • Database : MYSQL

REFERENCE:

Scott Bourne, Claudia Szabo, Member, IEEE, Quan Z. Sheng, Member, IEEE, “Transactional Behavior Verification inBusiness Process as a Service Configuration”, IEEE TRANSACTIONS ON SERVICE COMPUTING 2019.

 

Trust-based Privacy-Preserving Photo Sharing in Online Social Networks

Trust-based Privacy-Preserving Photo Sharing in Online Social Networks

ABSTRACT:

With the development of social media technologies, sharing photos in online social networks has now become a popular way for users to maintain social connections with others. However, the rich information contained in a photo makes it easier for a malicious viewer to infer sensitive information about those who appear in the photo. How to deal with the privacy disclosure problem incurred by photo sharing has attracted much attention in recent years. When sharing a photo that involves multiple users, the publisher of the photo should take into all related users’ privacy into account. In this paper, we propose a trust-based privacy preserving mechanism for sharing such co-owned photos. The basic idea is to anonymize the original photo so that users who may suffer a high privacy loss from the sharing of the photo cannot be identified from the anonymized photo. The privacy loss to a user depends on how much he trusts the receiver of the photo. And the user’s trust in the publisher is affected by the privacy loss. The anonymiation result of a photo is controlled by a threshold specified by the publisher. We propose a greedy method for the publisher to tune the threshold, in the purpose of balancing between the privacy preserved by anonymization and the information shared with others. Simulation results demonstrate that the trust-based photo sharing mechanism is helpful to reduce the privacy loss, and the proposed threshold tuning method can bring a good payoff to the user.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7.
  • Coding Language :
  • Tool : Netbeans 7.2.1
  • Database : MYSQL

REFERENCE:

Lei Xu1, Ting Bao1, Liehuang Zhu1 and Yan Zhang, “Trust-based Privacy-Preserving Photo Sharing inOnline Social Networks”, IEEE Transactions on Multimedia, Volume: 21 , Issue: 3 , March 2019.

 

Extending 3-bit Burst Error-Correction Codes With Quadruple Adjacent Error Correction

Extending 3-bit Burst Error-Correction Codes With Quadruple Adjacent Error Correction

ABSTRACT:

The use of error-correction codes (ECCs) with advanced correction capability is a common system-level strategy to harden the memory against multiple bit upsets (MBUs). Therefore, the construction of ECCs with advanced error correction and low redundancy has become an important problem, especially for adjacent ECCs. Existing codes for mitigating MBUs mainly focus on the correction of up to 3-bit burst errors. As the technology scales and cell interval distance decrease, the number of affected bits can easily extend to more than 3 bit. The previous methods are therefore not enough to satisfy the reliability requirement of the applications in harsh environments. In this paper, a technique to extend 3-bit burst error-correction (BEC) codes with quadruple adjacent error correction (QAEC) is presented. First, the design rules are specified and then a searching algorithm is developed to find the codes that comply with those rules. The H matrices of the 3-bit BEC with QAEC obtained are presented. They do not require additional parity check bits compared with a 3-bit BEC code. By applying the new algorithm to previous 3-bit BEC codes, the performance of 3-bit BEC is also remarkably improved. The encoding and decoding procedure of the proposed codes is illustrated with an example. Then, the encoders and decoders are implemented using a 65-nm library and the results show that our codes have moderate total area and delay overhead to achieve the correction ability extension.

SOFTWARE IMPLEMENTATION:

  • Modelsim
  • Xilinx 14.2

EXISTING SYSTEM:

Reliability is an important requirement for space applications. Memories as the data storing components play a significant role in the electronic systems. They are widely used in the system on a chip and application-specific integrated circuits. In these applications, memories

Fig. 1. Memory cell area of different technology (cell area shape is simplified to a square, and Length is the length of side

account for a large portion of the circuit area. This makes memories suffer more space radiation than other components. Therefore, the sensitivity to radiation of memories has become a critical issue to ensure the reliability of electronic systems. In modern static random access memories (SRAMs), radiation-induced soft errors in the form of the single event upset (SEU) and multiple bit upset (MBU) are two prominent single event effects. As semiconductor technology develops from the submicrometer technology to the ultradeep submicrometer (UDSM) technology, the size of memory cells is smaller and more cells are included in the radius affected by a particle as shown in Fig. 1.

When a particle from a cosmic ray hits the basic memory cell, it generates a radial distribution of electron–hole pairs along the transport track. These generated electron–hole pairs can cause soft errors by changing the values stored in the memory cell leading to data corruption and system failure. For the transistors with a large feature size, a radiation event just affects one memory cell, which means that only the SEU occurs. In this case, the use of single error-correction (SEC)- double error-detection (DED) codes  is enough to protect the memory from radiation effects

As the feature size enters into DSM range, the critical charge keeps decreasing and the area of the memory cell scales down for each successive technology node. This makes more memory cells affected by a particle hit as shown in Fig. 2. For the CMOS bulk technology, with the cell-to-cell spacing decreasing, the electron–hole pairs generated in the substrate can diffuse to nearby cells and induce MBUs .

Fig. 2. Schematic description of memory cells included in the radiation effect with variation of the technology node.

This compares with the FDSOI technology, which isolates transistors and limits the multicollection mechanism. Therefore, the multicollection mechanism is more prominent for a bulk technology, and the MBU probability is higher. Although multiple bit error-correction codes (ECCs) can correct multiple bit errors in any error patterns not limited to the adjacent bits, the complexity of the decoding process and the limitation of the code block size limit their use. Meanwhile, from the generation principle of MBUs , the type of the MBUs depends on the initial angle of incidence and scattering angle of the secondary particles. Based on this, adjacent bit errors are dominant error patterns among the multiple bit errors. Therefore, adjacent bits correction ECCs become popular in memory-hardened designs. Many codes are proposed, and the capability of adjacent bits correction mainly focuses on the double adjacent error correction (DAEC), triple adjacent error correction (TAEC), and 3-bit burst error correction (BEC). An alternative to codes that can correct adjacent errors is to use an SEC or SEC-DED code combined with an interleaving of the memory cells. Interleaving ensures that cells that belong to the same logical word are placed physically apart. This means that an error on multiple adjacent cells affects multiple words each having only one bit error that can be corrected by an SEC code. As noted in previous studies, interleaving makes the interconnections and routing of the memory more complex and it will lead to an increase area and power consumption or limitations in the aspect ratio. Therefore, whether it is better to use SEC plus interleaving or a code that can correct adjacent errors will be design-dependent and both alternatives are of interest

As the technology comes to the UDSM, the area of the memory cell keeps further decreasing and even memories having atomic-dimension transistors appear. The ionization range of ions with the order of magnitude in micrometer can include more memory cells in the word-line direction as shown in Fig. 2 than the three bits previously considered. This means that the SEC-DAEC-TAEC codes may not be effective to ensure the memory reliability. Codes with more advanced correction ability are demanded. For example, codes designed with low redundancy for SEC-DAEC-TAEC and 3-bit BEC are presented. Therefore, extending the correction ability to quadruple adjacent bit errors would be of interest, especially if it can be done without adding extra parity bits.

In this paper, we present an improvement of 3-bit BEC codes to also provide quadruple adjacent error correction (QAEC). The code design technique for the QAEC with low redundancy is specified from two aspects: 1) error space satisfiability; and 2) unique syndrome satisfiability. Codes with QAEC for 16, 32, and 64 data bits are presented. From the view of the integrated circuits design, two criteria have been used to optimize the decoder complexity and decoder delay at the ECCs level: 1) minimizing the total number of ones in the parity check matrix and 2) minimizing the number of ones in the heaviest row of the parity check matrix. Additionally, based on the traditional recursive backtracing algorithm, an algorithm with the function of weight restriction and recording the past procedure is developed. The new algorithm not only reduces the cost of program run time, but also remarkably improves the performance of previous 3-bit BEC codes. The encoders and decoders for the QAEC codes are implemented in Verilog hardware description language (HDL). Area overhead and delay overhead are obtained by using a TSMC bulk 65-nm library. Compared with the previous 3-bit BEC codes, the area and delay overhead is moderate to achieve the correction ability extension.

DISADVANTAGES:

  • Not reliable in operation
  • Performance of 3-bit BEC is not amended.

PROPOSED SYSTEM:

Searching algorithms and tool development 

In this section, an algorithm is proposed to solve the Boolean satisfiability problem based on the discussion in the former section. Based on the algorithm, a code optimization tool is developed to obtain the target H matrix with custom optimization restrictions. The introduction of the algorithm is divided into two subsections. In the basic part of the algorithm is introduced to find the solutions meeting the requirement of the Boolean satisfiability. Based on the basic part, the method with column weight restriction is designed to force the optimization process to use as few ones as possible, thus optimizing the total number of ones in the matrix and the number of ones in the heaviest row. This optimized version of the algorithm has been used to obtain all the codes presented in this paper.

Basic Part of Code Design Algorithm

 In order to construct the expected codes, the first step is to ensure the number of the check bits. The number of the check bits is seven for codes with 16 data bits, eight for codes with 32 data bits, and nine for codes with 64 data bits.

The main idea of the algorithm is based on the recursive backtracing algorithm. At first, an identity matrix with block size (n − k) is constructed as the initial matrix, and the corresponding syndromes of the error patterns are added to the syndrome set. Then, a column vector selected from Fig. 3. Flow of code design algorithm. the 2n−k − 1 column candidates is added to the right side of the initial matrix. This process is defined as columnadded action. Meanwhile, the new syndromes which belong to the new added column are calculated. If none of new syndromes is equal to the elements in a syndrome set, the column-added action is successful and the corresponding new syndromes are added into the syndrome set. The base-matrix is updated to the previous matrix with the added column. Otherwise, the column-added action fails and another new column from the candidates is selected. If all the candidates are tried and the column-added action still fails, one column from the right side of previous matrix and the corresponding syndromes are reduced from the base-matrix and the syndrome set, respectively. Then, the algorithm continues the columnadded action until the matrix dimension reaches the expected value. The code design algorithm flow is shown in Fig. 3.

Normally, the recursive backtracing algorithm demands a large amount of computing resources and computing time. In order to accelerate the computing speed of the algorithm operation, firstly, we adopt the decimal operation instead of the matrix operation by conversing the column vectors into decimal numbers. Even so, the algorithm completing the execution of all conditions is not possible. In general, if the code we expect exists, it is easy to obtain the first solution. With different optimization criteria, the algorithm can get better solutions. However, searching the best solution requires the complete result of the whole searching process, which is in most cases, unfeasible with today’s computing resources. Therefore, it is more practical to use the best result obtained in a reasonable computation time. To be consistent with , that time was set to one week for all the results presented in this paper. Fig. 4. Flow of algorithm with column weight restriction and past procedure record is shown below

it can be observed that the new algorithms are able to find better solutions than the existing algorithms such as the one used in . Therefore, they can be applied for the finding process of the QAEC codes. The solutions found for QAEC codes are presented  in the paper.

ADVANTAGES:

  • Reliable in operation
  • Performance of 3-bit BEC is improved.

 

REFERENCES:

Jiaqiang Li , Student Member, IEEE, Pedro Reviriego, Senior Member, IEEE, Liyi Xiao, Member, IEEE,Costas Argyrides, Senior Member, IEEE, and Jie Li, “Extending 3-bit Burst Error-Correction Codes WithQuadruple Adjacent Error Correction”, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2018.

Spatially Adaptive Block-Based Super-Resolution

Spatially Adaptive Block-Based Super-Resolution

ABSTRACT:

Super-resolution technology provides an effectiveway to increase image resolution by incorporating additionalinformation from successive input images or training samples.Various super-resolution algorithms have been proposed basedon different assumptions, and their relative performances candiffer in regions of different characteristics within a single image.Based on this observation, an adaptive algorithm is proposedin this paper to integrate a higher level image classificationtask and a lower level super-resolution process, in which weincorporate reconstruction-based super-resolution algorithms,single-image enhancement, and image/video classification intoa single comprehensive framework. The target high-resolutionimage plane is divided into adaptive-sized blocks, and differentsuitable super-resolution algorithms are automatically selected forthe blocks. Then, a deblocking process is applied to reduce blockedge artifacts. A new benchmark is also utilized to measure theperformance of super-resolution algorithms. Experimental resultswith real-life videos indicate encouraging improvements with ourmethod.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram :1 GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7.
  • Coding Language :MATLAB
  • Tool : MATLAB R2013A

REFERENCE:

Heng Su, Liang Tang, Ying Wu, Senior Member, IEEE, Daniel Tretter, and Jie Zhou, Senior Member, IEEE, “Spatially Adaptive Block-Based Super-Resolution”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 3, MARCH 2012.

Sparse Color Interest Points for Image Retrieval and Object Categorization

Sparse Color Interest Points for Image Retrieval and Object Categorization

ABSTRACT:

Interest point detection is an important research areain the field of image processing and computer vision. In particular,image retrieval and object categorization heavily rely on interestpoint detection from which local image descriptors are computedfor image matching. In general, interest points are based on luminance,and color has been largely ignored. However, the use of colorincreases the distinctiveness of interest points. The use of colormay therefore provide selective search reducing the total numberof interest points used for image matching. This paper proposescolor interest points for sparse image representation. To reduce thesensitivity to varying imaging conditions, light-invariant interestpoints are introduced. Color statistics based on occurrence probabilitylead to color boosted points, which are obtained throughsaliency-based feature selection. Furthermore, a principal componentanalysis-based scale selectionmethod is proposed, which givesa robust scale estimation per interest point. From large-scale experiments,it is shown that the proposed color interest point detectorhas higher repeatability than a luminance-based one. Furthermore,in the context of image retrieval, a reduced and predictablenumber of color features show an increase in performancecompared to state-of-the-art interest points. Finally, in the contextof object recognition, for the Pascal VOC 2007 challenge, ourmethod gives comparable performance to state-of-the-artmethodsusing only a small fraction of the features, reducing the computingtime considerably.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram :1 GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7.
  • Coding Language :
  • Tool : MATLAB R2013A

REFERENCE:

Julian Stöttinger, Allan Hanbury, Nicu Sebe, Senior Member, IEEE, and Theo Gevers, Member, IEEE, “Sparse Color Interest Points for ImageRetrieval and Object Categorization”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 5, MAY 2012.

Sharpness Enhancement of Stereo Images Using Binocular Just-Noticeable Difference

Sharpness Enhancement of Stereo Images Using Binocular Just-Noticeable Difference

ABSTRACT:

In this paper, we propose a new sharpness enhancementalgorithm for stereo images. Although the stereo image andits applications are becoming increasingly prevalent, there hasbeen very limited research on specialized image enhancementsolutions for stereo images. Recently, a binocular just-noticeable-difference (BJND) model that describes the sensitivity of thehuman visual system to luminance changes in stereo images hasbeen presented. We introduce a novel application of the BJNDmodel for the sharpness enhancement of stereo images. To thisend, an overenhancement problem in the sharpness enhancementof stereo images is newly addressed, and an efficient solution forreducing the overenhancement is proposed. The solution is foundwithin an optimization framework with additional constraintterms to suppress the unnecessary increase in luminance values.In addition, the reliability of the BJND model is taken into accountby estimating the accuracy of stereo matching. Experimentalresults demonstrate that the proposed algorithm can providesharpness-enhanced stereo images without producing excessivedistortion.

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB

SOFTWARE REQUIREMENTS:

  • Operating system : Windows 7.
  • Coding Language :
  • Tool : MATLAB R2013A

REFERENCE:

Seung-Won Jung, Jae-Yun Jeong, and Sung-Jea Ko, Senior Member, IEEE, “Sharpness Enhancement of Stereo Images UsingBinocular Just-Noticeable Difference”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 3, MARCH 2012.

Semi-supervised Biased Maximum Margin Analysis for Interactive Image Retrieval

Semi-supervised Biased Maximum Margin Analysis for Interactive Image Retrieval

ABSTRACT:

With many potential practical applications, content-based image retrieval (CBIR) has attracted substantial attention during the past few years. A variety of relevance feedback (RF) schemes have been developed as a powerful tool to bridge the semantic gap between low-level visual features and high-level semantic concepts, and thus to improve the performance of CBIR systems. Among various RF approaches, support-vector-machine (SVM)-based RF is one of the most popular techniques in CBIR. Despite the success, directly using SVM as an RF scheme has two main drawbacks. First, it treats the positive and negative feedbacks equally, which is not appropriate since the two groups of training feedbacks have distinct properties. Second, most of the SVM-based RF techniques do not take into account the unlabeled samples, although they are very helpful in constructing a good classifier. To explore solutions to overcome these two drawbacks, in this paper, we propose a biased maximum margin analysis (BMMA) and a semisupervised BMMA (SemiBMMA) for integrating the distinct properties of feedbacks and utilizing the information of unlabeled samples for SVM-based RF schemes. The BMMA differentiates positive feedbacks from negative ones based on local analysis, whereas the SemiBMMA can effectively integrate information of unlabeled samples by introducing a Laplacian regularizer to the BMMA. We formally formulate this problem into a general subspace learning task and then propose an automatic approach of determining the dimensionality of the embedded subspace for RF. Extensive experiments on a large real-world image database demonstrate that the proposed scheme combined with the SVM RF can significantly improve the performance of CBIR systems.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

SYSTEM ARCHITECTURE:

EXISTING SYSTEM:

A variety of relevance feedback (RF) schemes have been developed as a powerful tool to bridge the semantic gap between low-level visual features and high-level semantic concepts, and thus to improve the performance of CBIR systems. Among various RF approaches, support-vector-machine (SVM)-based RF is one of the most popular techniques in CBIR.

DISADVANTAGES OF EXISTING SYSTEM:

Despite the success, directly using SVM as an RF scheme has two main drawbacks. First, it treats the positive and negative feedbacks equally, which is not appropriate since the two groups of training feedbacks have distinct properties. Second, most of the SVM-based RF techniques do not take into account the unlabeled samples, although they are very helpful in constructing a good classifier.

The low-level features captured from the images may not accurately characterize the high-level semantic concepts

PROPOSED SYSTEM:

The proposed scheme is mainly based on the following:

1) The effectiveness of treating positive examples and negative examples unequally

2) The significance of the optimal subspace or feature subset in interactive CBIR;

3) The success of graph embedding in characterizing intrinsic geometric properties of the data set in high-dimensional space

4) The convenience of the graph-embedding framework in constructing semi-supervised learning techniques.

ADVANTAGES OF PROPOSED SYSTEM:

To explore solutions to these two aforementioned problems in the current technology, we propose a biased maximum margin analysis (BMMA)and a semisupervised BMMA(SemiBMMA) for the traditional SVM RF schemes, based on the graph-embedding framework With the incorporation of BMMA, labeled positive feedbacks are mapped as close as possible, whereas labeled negative feedbacks are separated from labeled positive feedbacks by a maximum margin in the reduced subspace.

The traditional SVM combined with BMMA can better model the RF process and reduce the performance degradation caused by distinct properties of the two groups of feedbacks. The SemiBMMA can incorporate the information of unlabeled samples into the RF and effectively alleviate the over fitting problem caused by the small size of labeled training samples.

To show the effectiveness of the proposed scheme combined with the SVM RF, we will compare it with the traditional SVM RF and some other relevant existing techniques for RF on a real-world image collection.

Experimental results demonstrate that the proposed scheme can significantly improve the performance of the SVMRF for image retrieval.

MODULES:

  • Training and Indexing Module
  • Graph-Embedding Framework
  • Features Extraction Based on Different Methods
  • Visualization of the Retrieval Results
  • Experiments on a Large-Scale Image Database:
  • Experiments on a Small-Scale Image Database

MODULES DESCRIPTION:

Training and Indexing Module

In this module, we index and train the system. Indexing the whole set of images is done for making the search efficient and time consuming. If we don’t index the system, then it takes more time as it searches the whole disk space..Indexing is done using an implementation of the Document Builder Interface. A simple approach is to use the Document Builder Factory, which creates Document Builder instances for all available features as well as popular combinations of features (e.g. all JPEG features or all avail-able features).  In a content based image retrieval system, target images are sorted by feature similarities with respect to the query (CBIR).In this indexing, we propose to classification of feature set obtained from the CBIR.First, it randomly selects k of the objects, each of which initially represents a cluster mean or center. For each of the remaining objects, an object is assigned to the cluster to which it is the most similar, based on the distance between the object and the cluster mean. It then computes the new mean for each cluster.

Graph-Embedding Framework

In order to describe our proposed approach clearly, we first review the graph-embedding framework Generally, for a classification problem, the sample set can be represented as matrix , where indicates the total number of the samples and is the feature dimensionality. Let be an undirected similarity graph, which is called an intrinsic graph, with vertices set and similarity matrix. The similarity matrix is real and symmetric, and measures the similarity between a pair of vertices; can be formed using various similarity criteria. The corresponding diagonal matrix and the Laplacian matrix of graph can G. Graph embedding of graph is defined as an algorithm to determine the low-dimensional vector representations of the vertex set, where is lower than for dimensionality. The column vector is the embedding vector for vertex, which preserves the similarities between pairs of vertices in the original high-dimensional space. Then, in order to characterize the difference between pairs of vertices in the original high-dimensional space, a penalty graph is also defined, where vertices are the same as those of, but the edge weight matrix corresponds to the similarity characteristics that are to be suppressed in the low-dimensional feature space. For a dimensionality reduction problem, direct graph embedding requires an intrinsic graph, whereas a penalty graph is not a necessary input.

Features Extraction Based on Different Methods

Six experiments are conducted for comparing the BMMA with the traditional LDA, the BDA method, and a graph-embedding approach, i.e., MFA, in finding the most discriminative directions. We plot the directions that correspond to the largest Eigen value of the decomposed matrices for LDA, BDA, MFA, and BMMA, respectively. From these examples, we can clearly notice that LDA can find the best discriminative direction when the data from each class are distributed as Gaussian with similar covariance matrices Biased toward the positive samples, BDA can find the direction that the positive samples are well separated with the negative samples when the positive samples have a Gaussian distribution, but it may also confuse when the distribution of the positive samples is more complicated. Biased toward positive samples, the BMMA method can find the most discriminative direction for all the six experiments based on local analysis, since it does not make any assumptions on the distributions of the positive and negative samples. It should be noted that BMMA is a linear method, and therefore, we only gave the comparison results of the aforementioned linear methods.

Visualization of the Retrieval Results

In the previous subsections, we have presented some statistically quantitative results of the proposed scheme. Here, we show the visualization of retrieval results.

In experiments, we randomly select some images (e.g., bobsled, cloud, cat, and car) as the queries and perform the RF process based on the ground truth. For each query image, we do four RF iterations. For each RF iteration, we randomly select some relevant and irrelevant images as positive and negative feedbacks from the first screen, which contains 20 images in total. The number of selected positive and negative feedbacks is about 4, respectively. We choose them according to the ground truth of the images, i.e., whether they share the same concept with the query image or not. The query images are given as the first image of each row. We show the top one to ten images of initial results without feedback and Semi BMMA SVM after four feedback iterations, respectively, and incorrect results are highlighted by green boxes. From the results, we can notice that our proposed scheme can significantly improve the performance of the system. For the first, second, and fourth query images, our system produces ten relevant images out of the top ten retrieved images. For the third query image, our system produces nine relevant images out of the top ten retrieved images. Therefore, Semi BMMA SVM can effectively detect the homogeneous concept shared by the positive samples and hence improve the performance of the retrieval system.

Experiments on a Large-Scale Image Database:

Here, we evaluate the performance of the proposed scheme on a real-world image database. We use precision–scope curve, precision rate, and standard deviation to evaluate the effectiveness of the image retrieval algorithms. The scope is specified by number of top-ranked images presented to the user. The precision is the major evaluation criterion, which evaluates the effectiveness of the algorithms. The precision–scope curve describes the precision with various scopes and can give the overall performance evaluation of the approaches. The precision rate is the ratio of the number of relevant images retrieved to the top retrieved images, which emphasizes the precision at a particular value of scope. Standard deviation describes the stability of different algorithms. Therefore, the precision evaluates the effectiveness of a given algorithm, and the corresponding standard deviation evaluates the robustness of the algorithm. We designed a slightly different feedback scheme to model the real world retrieval process. In a real image retrieval system, a query image is usually not in the image database. To simulate such an environment, we use fivefold cross validation to evaluate the algorithms. More precisely, we divide the whole image database into five subsets of equal size. Thus, there are 20% images per category in each subset. At each run of cross validation, one subset is selected as the query set, and the other four subsets are used as the database for retrieval. Then, 400 query samples are randomly selected from the query subset, and the RF is automatically implemented by the system. For each query image, the system retrieves and ranks the images in the database, and nine RF iterations are automatically executed.

Experiments on a Small-Scale Image Database

In order to show how efficient the proposed BMMA combined with SVM is in dealing with the asymmetric properties of feedback samples, the first evaluation experiment is executed on a small-scale database, which includes 3899 images with 30 different categories. We use all 3899 images in 30 categories as queries. Some example categories used in experiments. To avoid the potential problem caused by the asymmetric amount of positive and negative feedbacks, we selected an equal number of positive and negative feedbacks here. In practice, the first five query-relevant images and first five irrelevant images in the top 20 retrieved images in the previous iterations were automatically selected as positive and negative feedbacks, respectively.

HARDWARE REQUIREMENTS

  • PROCESSOR :  PENTIUM 4 CPU 2.40GHZ
  • RAM :   128 MB
  • HARD DISK :    40 GB
  • KEYBOARD    :    STANDARD
  • MONITOR     :    15”

SOFTWARE REQUIREMENTS

  • FRONT END :     MATLAB
  • TOOL :  MATLAB TOOLBOX
  • OPERATING SYSTEM :   WINDOWS XP/7
  • DOCUMENTATION :   MS-OFFICE 2007

REFERENCE:

Lining Zhang, Student Member, IEEE, LipoWang, Senior Member, IEEE, and Weisi Lin, Senior Member, IEEE, “Semisupervised Biased Maximum Margin Analysis for Interactive Image Retrieval”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012.