Multi-label image annotation

Multi-label image annotation

Semantic Context Learning with Large-Scale Weakly-Labeled Image Set, CIKM 2012
Large-Scale: There are a large number of images available on the web.
Weakly-Labeled: The labels of tagged images provided by Internet users may be incorrect or incomplete. Only a subset of web images can be labeled by professionals because manual annotation is time-consuming and labor-intensive.
Semantics richness: We require more than one label to describe one image in real applications, and multiple labels usually interact with each other in semantic space.
Semantic context: It is of significance to learn semantic context with large-scale weakly labeled image set in the task of multi-label annotation.

Figure 1. Framework of our approach. We develop a novel method to learn semantic context and predict the labels of web images in a semi-supervised framework. To address the scalability issue, a small number of exemplar images are first obtained to cover the whole data cloud; then the label vector of each image is estimated as a local combination of the exemplar label vectors. Visual context, semantic context, and neighborhood consistency in both visual and semantic spaces are sufficiently leveraged in the proposed framework. We simultaneously learn semantic context and the label confidence vectors by incorporating various information ( including visual context and semantic context ) in a single framework:

For the first term, label confidence vector should be consistent with given labels. The second term aims at restraining column view (the label confidence for the i−th sample) of the matrix

The third term aims at restraining row view (detection score of concept c on samples) of the matrix.

We optimize the target function with gradient search via the iterative alternating updating procedure:

Experiments
We evaluate our method on the image dataset NUSWIDE comprising 269,648 images with over 5,000 user-provided tags. Since the ground-truth of 81 concepts for the entire dataset can be used for evaluation, we focus on the 81 concepts in experiments. Two image pools are constructed from the entire dataset: the pool of labeled images is comprised of 161,789 images while the rest are used for the pool of unlabeled ones. For each image, we first extract two types of visual features: 512-Dim GIST and 1024-Dim SIFT. 3000 exemplar images are selected in experiments.

Figure 2. Experimental result of our method in comparison with the baselines SVM and TagProp [Guillaumin , et.al, ICCV, 2009.] in terms of F score for individual concepts on NUS-WIDE. . Click to amplify.

Figure 3.Semantic context learned from NUS-WIDE dataset. Each concept is linked with relevant concepts with larger weights. Click to amplify.

Automatic Image Annotation with Weakly Labeled Datasetss, ACM Multimedia 2011 Overview
A novel method is developed to predict multiple labels for images and to provide region-level labels for the objects. The meaning of the terminology weakly labeled is threefold: i) Only a small subset of available images are labeled; ii) Even for the labeled image, the given labels may be incorrect or incomplete; iii) The given labels do not provide the exact object locations in the images.
Each image is firstly segmented into regions. All regions are clustered into several groups. In the visual feature space, region-exemplars are used to construct the visual context graph. In the semantic label space, concepts form the semantic context graph which captures the correlations between concepts.

Figure 1. Overview Experiments
The experiment is based on NUS-WIDE-SUB. We focus on evaluating our method when the number of labeled images is much smaller than that of unlabeled ones. We randomly split the dataset into the 10% subset for training and the 90% subset for testing.

Figure 2. Experimental results.

ML-LGC: D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. NIPS, 2003.
CNMF: Y. Liu, R. Jin, and L. Yang. Semi-supervised multi-label learning by constrained non-negative matrix factorization. AAAI, 2006.
SMSE: G. Chen, Y. Song, F. Wang, and C. Zhang. Semi-supervised multi-label learning by solving a sylvester equation. SDM, 2008.
MISSL: R. Rahmani and S. Goldman. Missl: Multiple-instance semi-supervised learning. ICML, 2006.
MEG: D. Liu, S. Yan, Y. Rui, and H.-J. Zhang. Unified tag analysis with multi-edge graph. ACM MM, 2010.


(a)	(b)

Figure 3. (a)Region-level labeling results for some images. (b)Refined label vector for some images.

Papers
X. Xue, W. Zhang, J. Zhang, B. Wu, J. Fan, Y. Lu. Correlative Multi-Label Multi-Instance Image Annotation. 13th International Conference on Computer Vision (ICCV). Barcelona, Spain. 2011. [PDF][Poster]
W. Zhang, Y. Lu, X. Xue. Automatic Image Annotation with Weakly Labeled Datasets. ACM Multimedia. Scottsdale, USA. 2011.[PDF]
Y.Lu, W. Zhang, K. Zhang, X. Xue. Semantic Context Learning with Large-Scale Weakly-Labeled Image Set. ACM Conference on Information and Knowledge Management (CIKM). Hawaii, USA, 2012. [PDF]