Research Article: Saliency Mapping Enhanced by Structure Tensor
Research Article: Saliency Mapping Enhanced by Structure Tensor
Research Article
Saliency Mapping Enhanced by Structure Tensor
Copyright © 2015 Zhiyong He et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We propose a novel efficient algorithm for computing visual saliency, which is based on the computation architecture of Itti
model. As one of well-known bottom-up visual saliency models, Itti method evaluates three low-level features, color, intensity,
and orientation, and then generates multiscale activation maps. Finally, a saliency map is aggregated with multiscale fusion. In
our method, the orientation feature is replaced by edge and corner features extracted by a linear structure tensor. Following it,
these features are used to generate contour activation map, and then all activation maps are directly combined into a saliency map.
Compared to Itti method, our method is more computationally efficient because structure tensor is more computationally efficient
than Gabor filter that is used to compute the orientation feature and our aggregation is a direct method instead of the multiscale
operator. Experiments on Bruce’s dataset show that our method is a strong contender for the state of the art.
Input image
Feature Maps
Activation Maps
Colors Intensity Contour
Saliency map
Figure 1: General architecture of our method. In our method, we call the activation map generated by edge and corner features contour
activation map. The final saliency map combined these activation maps into ST saliency map.
in Figure 1 and is the same as Itti method in feature In Section 4, we present our experimental results and quanti-
extraction and activation maps generation for intensity and tative evaluations on a challenging dataset and discuss them.
color features. The features of edge and corner are extracted This paper closes with a conclusion of our work in Section 5.
by structure tensor and directly combined into an activation
map, called contour map. After obtaining three activation 2. Related Work
maps, we use linear combination to aggregate activation maps
to a saliency map instead of multiscale combination and Visual saliency methods are generally categorized into bio-
winner-take-all rule. logically inspired methods and computationally oriented
This paper makes two major contributions as follows: methods. There is an extensive literature on the areas, but
here we mention just a few relevant papers. Some surveys are
(1) We propose a novel efficient algorithm to calculate the found in [17–19], and some recent progress is reported in [20].
saliency map. Compared to other methods performed Koch and Ullman [11] proposed a basic architecture of
on a challenging dataset, besides the best performance biologically inspired methods and defined a saliency map
achieved, the results of our method obtain sharper as a topographic map that represents conspicuousness of
boundaries which are useful in some further applica- scene locations. Their work also introduced a winner-take-
tions such as object segmentation and detection. all neural network that selects the most salient location and
(2) Our work has shown that edge and corner are two employs an inhibition of return mechanism to allow the focus
important low-level features in saliency generation. of attention to shift to the next most salient location. Then
Itti et al. presented a computational model to implement
The paper is organized as follows. Section 2 briefly reviews and verify Koch and Ullman model. Since then, the works
the state-of-the-art methods with particular emphasis on related to the saliency map have quickly become one of the
saliency algorithms related to Itti method, and Section 3 hot research fields.
introduces some backgrounds of structure tensor and for- Itti method employs a Difference of Gaussian (DOG)
mally describes our algorithm of saliency map computation. operator to evaluate color, intensity, and orientation features
Computational Intelligence and Neuroscience 3
(1) Input:
(2) Input image I: three-channel and size (𝑚, 𝑛)
(3) Output:
(4) Edge feature map A: one channel and size (𝑚1, 𝑛1)
(5) Corner feature map B: one channel and size (𝑚1, 𝑛1)
(6) Contour activation map C: one channel and size (𝑚1, 𝑛1)
(7) Begin
(8) Resize the input image I to (𝑚1, 𝑛1), called Im-Re
(9) for 𝑗 ← 1, 𝑛1 do
(10) for 𝑖 ← 1, 𝑚1 do
(11) For Im-Re, calculate structure tensor 𝐽𝜎 using (6)
(12) Calculate eigenvalues 𝜆 1 and 𝜆 2 using (7) and (8), respectively
(13) 𝐴(𝑖, 𝑗) = 𝜆 1 − 𝜆 2
(14) 𝐵(𝑖, 𝑗) = 𝜆 1 + 𝜆 2
(15) end for
(16) end for
(17) Normalize A and B into a fixed range [0⋅ ⋅ ⋅ 1]
(18) Combine normalized A and normalized B into CT
(19) End
Based on (4), some types of structure tensor have been In the final step, we combine feature maps into a contour
constructed. In our work, we use a linear structure tensor to activation map CT as follows:
analyze the input image, and it is defined as
1
CT = (𝑁 (𝐴) + 𝑁 (𝐵)) , (10)
𝜕𝐼 2
𝜕𝐼 𝜕𝐼 2
3 [
𝐾𝜎 ∗ ( 𝑖 ) 𝐾𝜎 ∗ ( 𝑖 ⋅ 𝑖 )]
[ 𝜕𝑥 𝜕𝑥 𝜕𝑦 ] where 𝑁(𝐴) is the normalized edge feature map and 𝑁(𝐵) is
𝐽𝜎 = ∑ [ [
],
] (6) the normalized corner feature map.
𝑖=1 [ 𝜕𝐼𝑖 𝜕𝐼𝑖 𝜕𝐼𝑖 2 ]
𝐾𝜎 ∗ ( ⋅ ) 𝐾𝜎 ∗ ( )
[ 𝜕𝑥 𝜕𝑦 𝜕𝑦 ] 3.3. ST Saliency Map Generation. We assume that all features
equally contribute to the ST saliency map generation. After
where 𝐾𝜎 is a Gaussian kernel with variance 𝜎 and ∗ is a obtaining the contour activation map, the intensity activation
convolution operator. The parameter 𝑖 is the image channel map, and the color activation map, we combine them into a
number. saliency map as follows:
For any kind of structure tensor, we use [ 𝐺𝐹 𝐻
𝐹 ] to simply
Figure 3: Saliency maps of our method. The odd row is the input images from Bruce dataset, and the even row is the saliency generated by
our method. Obviously, the saliency maps of our method have sharp edges.
6 Computational Intelligence and Neuroscience
Figure 4: Saliency maps on the Bruce dataset. (a) Input image, (b) our method, (c) Itti method, (d) AIM method, (e) DVA method, (f) GBVS
method, and (g) IS method using LAB color space. Since our method includes the edge and corner information, saliency maps of our method
have sharp edges that are useful for the further steps in some computer vision tasks.
the codes on the authors’ websites. Saliency maps are shown remove this center bias, following the procedure of Tatler et
in Figure 4. al.’s work [29], Hou et al. [28] introduced ROC Area Under
the Curve (AUC) score to quantitatively evaluate the perfor-
4.2. Analysis of Performance. We evaluated our method on mance of different algorithms. Good results should maximize
Bruce dataset containing 120 natural images with eye fixation the ROC AUC score. To compare the ROC AUC scores, we
ground truth data. In Bruce dataset, the size of all images is follow the computation method provided by [28], but the size
681 × 511. Some of methods are sensitive to different sizes of (170 × 128) is different with two input image sizes used in [28].
the input image. As a consequence, in order to fairly evaluate Comparison of the ROC AUC scores is shown in Figure 5.
results of different methods, we resize the input images to the We conducted our tests on a laptop with Intel Dual-Core
same size (170 × 128) for each method. i5-4210U 1.7 GHz CPU and 4 G RAM memory. All codes were
Results from perceptual research works [29, 30] have written in MATLAB.
found that human fixations have strong center bias which The execution times of the methods are summarized in
may affect the performance of a saliency algorithm. To Figure 6, in which the time is an average time of 120 images.
Computational Intelligence and Neuroscience 7
ROC score dependence on blur tensor to extract these features. The reason that our algorithm
0.75
is efficient lies in the following: (1) linear structure tensor is
(averaged across 120 images)
2.415 References
1 [1] M. Carrasco, “Visual attention: the past 25 years,” Vision
0.842 Research, vol. 51, no. 13, pp. 1484–1525, 2011.
0.8
[2] J. Han, K. N. Ngan, M. Li, and H.-J. Zhang, “Unsupervised
0.6
0.406
extraction of visual attention objects in color images,” IEEE
0.4 0.353 0.318 Transactions on Circuits and Systems for Video Technology, vol.
0.2 0.199 16, no. 1, pp. 141–145, 2006.
0 [3] E. Rahtu, J. Kannala, M. Salo, and J. Heikkilä, “Segmenting
salient objects from images and videos,” in Computer Vision—
Proposed DVA ECCV 2010, vol. 6315 of Lecture Notes in Computer Science, pp.
Itti GBVS 366–379, Springer, Berlin, Germany, 2010.
AIM Itti-orientation [4] S. Avidan and A. Shamir, “Seam carving for content-aware
Figure 6: Results of the performance of these different methods. image resizing,” ACM Transactions on Graphics, vol. 26, no. 3,
Time measurements are given in seconds. The results are the average article 10, 2007.
times of 120 images of Bruce dataset. [5] S. Goferman, L. Zelnik-Manor, and A. Tal, “Context-aware
saliency detection,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 34, no. 10, pp. 1915–1926, 2012.
[6] D. Vaquero, M. Turk, K. Pulli, M. Tico, and N. Gelfand, “A
The figure shows that our method is about twice as fast as survey of image retargeting techniques,” in Applications of
Itti method and outperforms other state-of-the-art methods. Digital Image Processing XXXIII, vol. 7798 of Proceedings of
The reason lies in two parts. First, structure tensor is an SPIE, pp. 779–814, SPIE Optical Engineering + Applications,
efficient algorithm of feature extraction. Second, we directly San Diego, Calif, USA, August 2010.
aggregate three activation maps into a saliency map. It is [7] A. Oliva, A. Torralba, M. S. Castelhano, and J. M. Henderson,
obvious that the performance will increase greatly if our “Top-down control of visual attention in object detection,” in
method is implemented by C/C++, and it should satisfy most Proceedings of the International Conference on Image Processing
of the real time applications. (ICIP ’03), pp. I-253–I-256, September 2003.
[8] X. Shen and Y. Wu, “A unified approach to salient object
detection via low rank matrix recovery,” in Proceedings of the
5. Conclusion IEEE Conference on Computer Vision and Pattern Recognition
(CVPR ’12), pp. 853–860, Providence, RI, USA, June 2012.
In this paper we have proposed an efficient algorithm for [9] M.-M. Cheng, N. J. Mitra, X. Huang, and S.-M. Hu,
computing the saliency map, which has a distinct boundary “SalientShape: group saliency in image collections,” The
that contributes to further computer vision applications such Visual Computer, vol. 30, no. 4, pp. 443–453, 2014.
as segmentation and detection. The computational architec- [10] U. Rutishauser, D. Walther, C. Koch, and P. Perona, “Is bottom-
ture of our method is close to Itti method, but we have made up attention useful for object recognition?” in Proceedings of
two improvements in low-level features extraction and com- the IEEE Computer Society Conference on Computer Vision and
bination of activation maps. Since features of edge and corner Pattern Recognition (CVPR ’04), vol. 2, pp. II-37–II-44, IEEE,
are important cues in visual saliency, we use a linear structure July 2004.
8 Computational Intelligence and Neuroscience
[11] C. Koch and S. Ullman, “Shifts in selective visual attention: [28] X. Hou, J. Harel, and C. Koch, “Image signature: highlighting
towards the underlying neural circuitry,” Human Neurobiology, sparse salient regions,” IEEE Transactions on Pattern Analysis
vol. 4, no. 4, pp. 219–227, 1985. and Machine Intelligence, vol. 34, no. 1, pp. 194–201, 2012.
[12] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual [29] B. W. Tatler, R. J. Baddeley, and I. D. Gilchrist, “Visual correlates
attention for rapid scene analysis,” IEEE Transactions on Pattern of fixation selection: effects of scale and time,” Vision Research,
Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, vol. 45, no. 5, pp. 643–659, 2005.
1998. [30] L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W.
[13] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” Cottrell, “SUN: a bayesian framework for saliency using natural
in Proceedings of the Advances in Neural Information Processing statistics,” Journal of Vision, vol. 8, no. 7, article 32, 2008.
Systems (NIPS ’06), pp. 545–552, Vancouver, Canada, December
2006.
[14] M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, and S.-
M. Hu, “Global contrast based salient region detection,” in
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR ’11), pp. 409–416, Providence, RI,
USA, June 2011.
[15] T. Liu, Z. Yuan, J. Sun et al., “Learning to detect a salient object,”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 33, no. 2, pp. 353–367, 2011.
[16] R. Valenti, N. Sebe, and T. Gevers, “Image saliency by isocentric
curvedness and color,” in Proceedings of the 12th IEEE Inter-
national Conference on Computer Vision, pp. 2185–2192, IEEE,
Kyoto, Japan, September-October 2009.
[17] A. Borji and L. Itti, “State-of-the-art in visual attention mod-
eling,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 35, no. 1, pp. 185–207, 2013.
[18] A. Borji, H. R. Tavakoli, D. N. Sihite, and L. Itti, “Analysis
of scores, datasets, and models in visual saliency prediction,”
in Proceedings of the 14th IEEE International Conference on
Computer Vision (ICCV ’13), pp. 921–928, Sydney, Australia,
December 2013.
[19] S. Frintrop, E. Rome, and H. I. Christensen, “Computational
visual attention systems and their cognitive foundations: a
survey,” ACM Transactions on Applied Perception, vol. 7, no. 1,
article 6, 2010.
[20] Z. Bylinskii, T. Judd, A. Borji et al., “MIT Saliency Benchmark,”
2015, http://saliency.mit.edu/index.html.
[21] Y.-F. Ma and H.-J. Zhang, “Contrast-based image attention
analysis by using fuzzy growing,” in Proceedings of the 11th ACM
International Conference on Multimedia (MM ’03), pp. 374–381,
ACM, November 2003.
[22] L. Itti and P. F. Baldi, “Bayesian surprise attracts human atten-
tion,” in Advances in Neural Information Processing Systems, pp.
547–554, MIT Press, 2005.
[23] T. Brox, J. Weickert, B. Burgeth, and P. Mrázek, “Nonlinear
structure tensors,” Image and Vision Computing, vol. 24, no. 1,
pp. 41–55, 2006.
[24] U. Köthe, “Edge and junction detection with an improved
structure tensor,” in Pattern Recognition, pp. 25–32, Springer,
Berlin, Germany, 2003.
[25] N. Bruce and J. Tsotsos, “Saliency based on information maxi-
mization,” in Proceedings of the Advances in Neural Information
Processing Systems (NIPS ’05), pp. 155–162, Vancouver, Canada,
December 2005.
[26] N. D. B. Bruce and J. K. Tsotsos, “Saliency, attention and visual
search: an information theoretic approach,” Journal of Vision,
vol. 9, no. 3, article 5, 2009.
[27] X. Hou and L. Zhang, “Dynamic visual attention: searching for
coding length increments,” in Advances in Neural Information
Processing Systems, pp. 681–688, MIT Press, 2009.
Advances in Journal of
Industrial Engineering
Multimedia
Applied
Computational
Intelligence and Soft
Computing
The Scientific International Journal of
Distributed
Hindawi Publishing Corporation
World Journal
Hindawi Publishing Corporation
Sensor Networks
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Advances in
Fuzzy
Systems
Modelling &
Simulation
in Engineering
Hindawi Publishing Corporation
Hindawi Publishing Corporation Volume 2014 http://www.hindawi.com Volume 2014
http://www.hindawi.com
International Journal of
Advances in Computer Games Advances in
Computer Engineering Technology Software Engineering
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
International Journal of
Reconfigurable
Computing