Image Re-ranking based on Topic Diversity

Image Re-ranking based on Topic Diversity


Social media sharing websites allow users to annotate images with free tags, which significantly contribute to the development of the web image retrieval. Tag-based image search is an important method to find images shared by users in social networks. However, how to make the top ranked result relevant and with diversity is challenging. In this paper, we propose a topic diverse ranking approach for tag-based image retrieval with the consideration of promoting the topic coverage performance. First, we construct a tag graph based on the similarity between each tag. Then community detection method is conducted to mine the topic community of each tag. After that, inter-community and intra-community ranking are introduced to obtain the final retrieved results. In the inter-community ranking process, an adaptive random walk model is employed to rank the community based on the multi-information of each topic community. Besides, we build an inverted index structure for images to accelerate the searching process. Experimental results on Flickr dataset and NUS-Wide datasets show the effectiveness of the proposed approach.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):


  • Currently, image clustering and duplicate removal are the major approaches in settling the diversity problem. However, most of the literature regards the diversity problem as to promote the visual diversity performance, but the promotion of the semantic coverage is often ignored.
  • To diversify the top ranked search results from the semantic aspect, the topic community belongs to each image should be considered.
  • Dang-Nguyen et al. first propose a clustering algorithm to obtain a topic tree, and then sort topics according to the number of images in the topic. In each cluster, the image uploaded by the user who has highest visual score is selected as the top ranked image. The second image is the one which has the largest distance to the first image. The third image is chosen as the image with the largest distance to both two previous images, and so on.
  • Most papers consider the diversity from visual perspective and achieve it by applying clustering on visual features


  • Tag mismatch
  • Query ambiguity
  • Most of the above literatures view the diversity problem as to promote the visual diversity but not the topic coverage.


  • In this paper, we focus on the topic diversity. We first group all the tags in the initial retrieval image list to make the tags with similar semantic be the same cluster, then assign images into different clusters. The images within the same cluster are viewed as the ones with similar semantics. After ranking the clusters and images in each cluster, we select one image from each cluster to achieving our semantic diversity.
  • In this paper, we propose to construct the tag graph and mine the topic community to diversify the semantic information of the retrieval results. The contributions of this paper are summarized as follows:
  • We propose a topic diverse ranking approach considering the topic coverage of the retrieved images. The inter-community ranking method and intra-community ranking methods are proposed to achieve a good trade-off between the diversity and relevance performance.
  • The tag graph construction based on each tag’s word vector and community mining approach are employed in our approach to detect topic community. The mined community can represent each sub-topic under the given query. Besides, in order to represent the relationship of tags better, we train the word vector of each tag based on the English Wikipedia corpus with the model word2vec.
  • We rank each mined community according to their relevance level to the query. In the inter-community ranking process, an adaptive random walk model is employed to accomplish the ranking based on the relevance of each community with respect to the query, pair-wise similarity between each community, and the image number in each community.


  • Good trade-off between the diversity and relevance performance.
  • With the adaptive random walk model, the community that possesses the bigger semantic relevance value with the query and larger confidence value will be ranked higher.
  • To diversify the top ranked retrieval results
  • Computes the similarity between the user-oriented image set and query based on the co-occurrence tag mechanism.
  • We sort the communities based on relevance scores obtained by random walk.




  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB


  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : ECLIPSE
  • Database : MYSQL


Xueming Qian, Member, IEEE, Dan Lu, Yaxiong Wang, Li Zhu, Yuan Yan Tang, Fellow, IEEE, and Meng Wang, “Image Re-ranking based on Topic Diversity”,  IEEE Transactions on Image Processing, 2017

About the Author