0% found this document useful (0 votes)
36 views

Deep-PRWIS: Periocular Recognition Without The Iris and Sclera Using Deep Learning Frameworks

Uploaded by

ragou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Deep-PRWIS: Periocular Recognition Without The Iris and Sclera Using Deep Learning Frameworks

Uploaded by

ragou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

888 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO.

4, APRIL 2018

Deep-PRWIS: Periocular Recognition Without the


Iris and Sclera Using Deep Learning Frameworks
Hugo Proença , Senior Member, IEEE, and João C. Neves, Member, IEEE

Abstract— This paper is based on a disruptive hypothesis


for periocular biometrics—in visible-light data, the recognition
performance is optimized when the components inside the ocular
globe (the iris and the sclera) are simply discarded, and the
recognizer’s response is exclusively based on the information from
the surroundings of the eye. As a major novelty, we describe a
processing chain based on convolution neural networks (CNNs)
that defines the regions-of-interest in the input data that should be
privileged in an implicit way, i.e., without masking out any areas
in the learning/test samples. By using an ocular segmentation
algorithm exclusively in the learning data, we separate the ocular
from the periocular parts. Then, we produce a large set of Fig. 1. Schema of the components in the ocular/periocular regions, with the
“multi-class” artificial samples, by interchanging the periocular three major factors that reduce the reliability of the ocular components for
and ocular parts from different subjects. These samples are biometric recognition in covert mode: 1) eye gaze; 2) iris/sclera occlusions;
used for data augmentation purposes and feed the learning and 3) corneal reflections.
phase of the CNN, always considering as label the ID of the
periocular part. This way, for every periocular region, the CNN In the biometrics domain, the covert recognition of
receives multiple samples of different ocular classes, forcing it humans (outdoor and non-cooperative) remains to be achieved,
to conclude that such regions should not be considered in its and will be a breakthrough in security/forensics applications.
response. During the test phase, samples are provided without Here, the periocular region is a trade-off between using the iris
any segmentation mask and the network naturally disregards and the face, with encouraging performance levels reported in
the ocular components, which contributes for improvements in
performance. Our experiments were carried out in full versions the literature. However, as it is illustrated in Fig. 1, it should
of two widely known data sets (UBIRIS.v2 and FRGC) and show be considered that:
that the proposed method consistently advances the state-of-the- • when imaged under visible-light, the iris (particularly)
art performance in the closed-world setting, reducing the EERs and the sclera are prune to corneal reflections, resulting
in about 82% (UBIRIS.v2) and 85% (FRGC) and improving the in the so-called Purkinje images;
Rank-1 over 41% (UBIRIS.v2) and 12% (FRGC).
• along with the body and head movements, the compo-
Index Terms— Soft biometrics, visual surveillance, homeland
security. nents in the ocular globe are subjected to an additional
motion source (eye gaze) that increases the probabilities
I. I NTRODUCTION of acquiring blurred data;
• the iris and the sclera are often partially occluded, due to
C ONVOLUTIONAL neural networks (CNNs) have
become extremely popular in many computer vision
tasks, from image segmentation [10], to detection [23] and
eyelids and eyelashes movements;
According to the above points, this paper describes a peri-
classification [9]. The property of shift/space invariance gives ocular recognition algorithm to work in poor quality visible-
them the biological inspiration and simultaneously keeps the light data, relying on CNNs to model complex data patterns.
number of weights relatively small, making learning a fea- The key is a data augmentation strategy based in multi-
sible task. Being data-driven models, CNNs do not depend class regions swapping, that implicitly induces the CNN to
on human efforts to specify the image features, upon the consider that some regions in the input data are not reliable
availability of large amounts of learning data. for classification purposes. This is seen as a novel way to
provide prior knowledge to this kind of networks, considerably
Manuscript received May 17, 2017; revised September 13, 2017; accepted improving performance without requiring extra amounts of
October 29, 2017. Date of publication November 8, 2017; date of current
version January 3, 2018. This work was supported by the FCT Project under learning data. Note that this strategy can be easily generalized
Grant UID/EEA/50008/2013. The associate editor coordinating the review of to other object classification problems, i.e., to any case where
this manuscript and approving it for publication was Dr. Christoph Busch. the discriminability provided by the different image compo-
(Corresponding author: Hugo Proença.)
The authors are with the Department of Computer Science, IT: Instituto de nents varies substantially and there is not enough learning
Telecomunicações, University of Beira Interior, 6201-001 Covilhã, Portugal data available to expect the autonomous inference of such
(e-mail: hugomcp@di.ubi.pt; jcneves@di.ubi.pt). conclusion by the network.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. The workflow is illustrated in Fig. 2 (Learning box):
Digital Object Identifier 10.1109/TIFS.2017.2771230 by using an ocular segmentation algorithm [17], we create
1556-6013 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
PROENÇA et al.: DEEP-PRWIS: PERIOCULAR RECOGNITION WITHOUT THE IRIS AND SCLERA 889

Fig. 2. Schema of the strategy used to implicitly force the CNN to disregard regions in the input data. Creating artificial “multi-class” samples that keep as
label the ID of the periocular part, leads the network to consider that ocular patterns are meaningless for biometric recognition. This yields four properties (given
at the top-right corner), which will not be verified for any other combination of learning/testing strategies (given at the bottom part of the figure).

a binary mask B that discriminates between the ocular O Section III describes our method. In Section IV we discuss the
(iris and sclera) and the remaining components P (henceforth obtained results and the conclusions are given in Section V.
designated as periocular, including the eyebrows, eyelids,
eyelashes and skin) in each learning sample. Next, a set
II. R ELATED W ORK
of artificial samples is created, interchanging the ocular and
periocular parts from different subjects, but always consid- The pioneer work on periocular biometrics was due to
ering as label the ID provided by the periocular part. This Park et al. [14] (extended in [15]). They consider the iris
way, during the learning phase, the CNN receives, for each as the reference for defining the ROI, described by HoG,
periocular part, samples of different ocular classes, forcing LBP and SIFT descriptors. The 2 norm is the distance
it to conclude that such regions should not be considered in measure for each descriptor, with results fused at the score
its response (i.e., the ID). During the test phase (Test box), level, by linear combination. This work provided the basis
samples are provided to the network without any segmenta- for a large number of subsequent methods: Mahalingam and
tion mask, yielding four key properties: 1) the CNN testing Ricanek Jr. [11] apply multi-scale, patch-based LBP descrip-
performance is not conditioned by the effectiveness of the tors, using iris center for data alignment. Ross et al. [21] use
segmentation step, known to be a primary error source in HoGs to extract the global image information, SIFT to extract
computer vision tasks; 2) the CNN naturally ignores the ocular local edge anomalies, and probabilistic deformation models to
components, focusing in the most discriminating information; handle non-linear deformations, with the sum rule combining
3) the learning and test data have similar appearance, which the dissimilarity scores. Bharadwaj et al. [2] apply global
contributes to the CNN’s generalization capability; and 4) from descriptors (GIST and circular LBPs), each one compared
a data augmentation perspective, the set of artificial samples using the Chi-square distance. Scores are also linearly com-
provided to the network also improves the CNN performance. bined. Woordard et al. [26] fuse local appearance-based feature
As shown in the bottom part of Fig. 2, any other combination descriptors to 2D color histograms (red and green channels),
of learning/test data (using explicit region masking) will not compared using the city-block (LBP) and Bhattacharya (color
keep these four properties simultaneously. histograms) distances. Joshi et al. [8] describe the periocular
As outcome of this work, the resulting periocular recogniser information by mean of a bank of complex Gabor filters,
outperforms consistently the state-of-the-art, decreasing the while Tan and Kumar [24] evaluate the effectiveness of SIFT,
EERs and improving the Rank-1 values with respect to the GIST, LBP, HoG and Leung-Malik Filters texture descriptors
baseline methods. Note that these results were obtained in two to provide discriminating information on periocular data. The
widely known data sets and using the entire set of images in singularity of Nie et al. [12]’s work is to combine this kind
both sets, i.e., without disregarding even the poorest quality of classical approach to a convolutional restricted Boltzmann
samples. machine, which enables to obtain the probability distribu-
The remainder of this paper is organised as follows: tions in the periocular data, discriminated by metric learning
Section II summarises the periocular biometrics research, and and SVMs.
890 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

Fig. 3. Structure of the convolutional neural network used in image classification. Six convolutional layers, three max-pooling and dropout layers are used
before the (three) fully connected and the soft-max layer, that estimates the sample identity. “s:_” denotes the stride, “p:_” specifies the padding, “w:_” is the
square neighborhood used in max-pooling layers and “r:_” defines the dropout rate. Note that all convolution layers also include "ReLU" non-linear transfer
functions.

Additional approaches are due to Chen and Ferryman [5], in Fig. 3. This classical architecture boosted the popularity of
which fuse 2D to 3D data, masking out the ocu- deep learning frameworks for image classification, and is know
lar region from the encoding and comparison process. to constitute a good trade-off between the number of model
Raghavendra et al. [20] exploit the light-field data acquisition parameters and the generalisation capabilities of the final
technology to produce sharp images for iris and periocular solution. The idea is to start by extracting features of increas-
recognizers, with scores linearly combined. Aiming at cross- ing complexity at the deeper layers of the network (using
spectral recognition, Cao and Schmid [4] convolve the perioc- convolution layers), which feed the final fully connected layers
ular region with a bank of Gabor filters, from which phase and that provide the final response. At the same time, max pooling
magnitude components are described by HoGs and histograms and dropout layers keep the number of parameters relatively
of LBPs descriptors. Features are concatenated and com- low while not compromising the generalization capabilities of
pared using the I-divergence measure. As an anti-counterfeit the network. Our input data are 150 × 200 × 3 RGB images
measure, Proença [19] propose an ensemble made of two that pass through convolution (at first), max-pooling, dropout
disparate experts: one analysing the iris texture and the other and fully connected layers. All the convolutional layers are
one parameterizing the shape of eyelids and analysing the adjacent to Rectified Linear Unit (ReLU) activation functions,
surrounding skin. Both experts provide independent responses being the it h output channel y(i) given by:
and do not share particularly sensitivity to any image covariate.

k 
In terms of deep learning-based approaches,
y(i) = max b(i j ) + w(i j ) ∗ x( j ), 0 p , (1)
Zhao and Kumar [27] use a CNN for periocular recognition
j =1
(as we do). The novelty is to consider explicit semantic
information to extract more comprehensive periocular where max(., 0 p ) is the component-wise maximum operator,
features, helping the CNN to improve performance. Refer 0 p = [0, . . . , 0]T is an p × 1 vector with all elements equal
to the surveys on periocular biometrics due to Alonso- to zero, b() and w() are the bias and weight terms tuned
Fernandez and Bigun [1] and Nigam et al. [13], for additional during the learning phase and x represents the layer inputs.
information about the periocular biometrics research. The max-pooling layers operate independently in each depth
Recently, particular attention has been paid to the recogni- slice of the input and take the maximum value over square
tion of cross-spectral iris/periocular data, i.e., when the pairs patches. Finally, dropout layers set to zero the output of each
of images to be compared were acquired using different light neuron during the learning step with probability r , avoiding
wavelengths (typically near infra-red and visible). Several that they contribute to the forward pass and participate in
approaches were published in this field, with some results and back-propagation.
relevant methods described in [22]. In our model, the first convolutional layer has 128 ker-
nels (5 × 5), using stride and padding of two pixels. Next,
III. P ROPOSED M ETHOD a max-pooling and a dropout layer feed the second and third
convolutional layers composed of 256 kernels (5 × 5, two
A. Deep Learning Architecture pixels of stride and padding). Again, a max-pooling shrinks
We use one of the most popular deep learning archi- the volume data and then two convolutional layers with output
tectures for image classification: Convolution Neural Net- size equal to the input are applied (256 kernels of size 3 × 3,
works (CNNs), which are a biologically inspired variant of stride and padding equal to one). Before the fully connected
multilayer perceptron networks (MLPs) particularly suitable layers, data pass through a convolution layer (with 512 kernels
for image classification. By making some assumptions about of size 3 × 3, stride and padding equal to one) a max pooling
the nature of the input data (e.g., stationarity of statistics and and a dropout layer, yielding 9 × 12 × 512 = 55, 296 features
locality of pixel dependencies), CNNs have much fewer con- entering in the fully connected layers. Another dropout layer
nections than MLPs, making learning a feasible task. In partic- is used before the soft-max layer, that produces a vector of
ular, we adopt a CNN architecture based in AlexNet [9], shown c positive elements corresponding to the probability for each
PROENÇA et al.: DEEP-PRWIS: PERIOCULAR RECOGNITION WITHOUT THE IRIS AND SCLERA 891

class label:
T
ex w j
P(y = j |x) =  T . (2)
x wk
ke
According to the output of the soft-max layer, the label
prediction is the class with the highest probability among the
c possibilities: ŷ = arg max j P(y = j |x). The CNNs were
trained using the stochastic gradient descent (SGD) algorithm,
with a batch size of 256 samples. As preprocessing step,
the mean of the learning data was subtracted from all samples.
The learning rate was 1e−3 , with a momentum of 0.9 and
a weight decay of 5e−4 . The number of iterations in each
experiment was set to 100. All weights in the CNN were
initialised according to Glorot and Bengio’s [6] method.

B. Data Augmentation
1) Ocular/Periocular Regions Swapping: Let Ii and I j be
150 × 200 × 3 RGB images from two different subjects. Using
the segmentation method described in [17], we obtain two Fig. 4. Left column: UBIRIS.v2 samples. At right: Artificial “multi-class”
binary masks Bi and B j (150 × 200 pixels) that discriminate samples composed of the periocular region given at left and the ocular parts
given below each image. Note that the periocular region in these “multi-class”’
between the ocular (iris and sclera) and the periocular compo- samples in each row is the same.
nents in I. . Let O. and P. denote the ocular and periocular parts
of I. . The goal is to create an artificial sample Pi O j composed
of the periocular region of Ii overlapping the ocular part of I j ,
which requires to find the scale and translation parameters,
such that O j optimally fits the ocular whole of Pi . Let b. be the
n × 1 vectorized version of B. (n = 30 000). The convolution
“*” between bi and b j is given in matrix form by:
bi ∗ b j = T(bi ) b j , (3)
being T(bi ) the Toeplitz matrix of bi :
⎡ ⎤⎫
bi 0 . . . 0 ⎪ ⎪
⎢ .. ⎥⎪


⎢ 0 bi . . . .⎥
T(bi ) = ⎢⎢. .. ..
⎥ (2n − 1) × n.
⎥⎪ (4) Fig. 5. Examples of the scale, translation and color transforms used. The
⎣ .. . . 0 ⎦⎪ ⎪
upper row illustrates the randomly cropped patches, and the bottom row shows

⎭ changes in color, obtained by adding multiples of the principal component
0 0 0 bi vectors to each image pixel.

Let 12n−1 = [1, . . . , 1]T be the (2n − 1) × 1 vector having


all elements equal to one. According to this formulation, the in our experiments). According to this formulation, (6) is a
value of: constrained optimization problem with inequality constraints,
T
  solved as described in [3]. In practice, we find the dis-
12n−1 T(bi ) b j , (5) placement (αx , α y ) of the scaled (by αs ) version of O j that
directly corresponds to the agreement of the ocular parts of optimally fits Pi , yielding artificial samples that are realistic
Bi and B j (i.e., their white regions). Let ¬b. be the negative and visually pleasant. Examples of this overlapping procedure
version of b. . As we are interested in maximise the “ocular” ↔ are given in Fig. 4, with the leftmost column showing samples
“ocular” and “periocular” ↔ “periocular” positions agreement, of the UBIRIS.v2 set, and the remaining columns displaying
while minimising the “ocular” ↔ “periocular” disagreements artificial samples composed of the periocular region at left and
between masks, the unknown scale and translation parameters different ocular parts.
α = [αs , αx , α y ] that optimally overlap Pi and O j are found 2) Spatial and Color Transforms: Additionally, two other
by: label-preserving transformations were used for data augmen-
  (αα ) 
α) 
tation purposes. At first, to simulate the scale and translation

α = arg min 12n−1
α̂ T
T(¬bi ) − T(bi ) b j − ¬b j samples inconsistency, patches of scale [0.75, 0.90] (values
α
1 drew uniformly) were randomly cropped from the learning
s.t.|| [αx , α y ]||∞ ≤ κ1 ∧ ≤ αs ≤ κ2 , (6) set, as illustrated in the upper row of Fig. 5. Second, to get a
κ2
color transform, we found the principal components of the
α
where b(α
j
α)
is the translated and scaled version of b j , and RGB values in all pixels of the learning data and created
κi avoid anatomically bizarre solutions (κ1 = 50, κ2 = 3 new versions of the images by adding to each pixel multiples
892 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

of the largest eigenvectors, with magnitude equal to the


corresponding eigenvalues [9]:
 
x(new) = x(old) + [v1, v2 , v3 ] α  [λ1 , λ2 , λ3 ]T , (7)

with  denoting the element-wise multiplication, v and λ


denoting the eigenvectors and eigenvalues of the learning data Fig. 6. Datasets used in our empirical evaluation. The upper row regards the
covariance matrix and α ∈ R3 being randomly drew from UBIRIS.v2 set, with five major degradation factors: iris occlusions, reflections,
varying pose, glasses and poor lighting conditions. The bottom row regards the
the Gaussian N (0, 0.1). Examples of the resulting images are FRGC set, where the major degradation factors are image blur, poor resolution
given in the bottom row of Fig. 5. and bad lighting.

IV. R ESULTS AND D ISCUSSION


A. Open vs. Closed-World Settings
When using classification models such as the ones in this
paper (CNNs), an important decision to make is about the
working mode most suitable for the model. In particular,
it should be defined if the resulting system is expected to
work in the open-world or closed-world mode, i.e., depending
if the system possesses at learning time samples from all
the classes that will be seen at runtime or not. In case of
CNN-based classification tasks, the closed-world mode
enables to use the output of the neurons in the final soft-max
layer as the probabilities for each class label. In opposition,
in the open-world mode, the number of different classes seen
at runtime is not known, and the soft-max layer cannot be
used. Instead, the output of the final convolution layer is
typically used as feature descriptor and the 2 norm gives
the distance between two feature sets, discriminating between
genuine or impostor comparisons. In our experiments, having
observed some preliminary results about the recognition per-
formance of our method in both modes, we decided to focus
exclusively in the closed-world scenario, i.e., assuming that the Fig. 7. Left plot: variations in recognition performance with respect to the
set of identities to be recognized is known in learning time. amount of augmented data, with respect to the number of samples in the data
set. Right plot: decision environment of the responses given by the neurons
As an example, the latter setting corresponds to a watch-list in the final layer of the CNN, distinguishing between the genuine (green) and
identification problem, where the goal is to find subjects in a impostor (red) class scores. The zoomed-in region turns particularly evident
short list among a crowd. the recogniser bias. The upper row regards the UBIRIS.v2 data set, whereas
the bottom row gives the corresponding values for the FRGC set.
B. Datasets and Experimental Protocol
Two datasets were selected for our experiments: 1) the two subsets: 80% for learning purposes and the remaining
UBIRIS.v2 [18], which is typically used for iris and peri- 20% for performance evaluation. Note that we manually
ocular recognition experiments. All images of this set were verified that the learning subsets, both for UBIRIS and FRGC,
used (11 102 images from 522 different eyes), regardless contain images from all the subjects (classes) in the data set,
the extreme poor quality of some of them. Images have assuring that the closed-world assumption is satisfied.
150 × 200 × 3 pixels and are represented in the RGB color The bootstrapping-like draw was repeated 10 times per
space; and 2) the Face Recognition Grand Challenge [16] data set, creating 10 subsets of each one. Next, the recogni-
(FRGC) set, released by the National Institute of Standards tion experiments (model learning and performance evaluation)
and Technology (NIST). Again, all the 24 946 RGB samples were carried out in each subset, which enabled us to obtain
in this set (with periocular regions cropped and resized into the average and standard deviation performance values at all
150 × 200 × 3 pixels) were considered. Cropping the left/right operating points for both the UBIRIS.v2 and FRGC sets.
eye regions from each image yields a total of 894 classes. These are the values that are reported in Table I and in all
Examples of some of the poorest quality images used in our ROC and RANK-N plots (with the lines providing the average
evaluation are given in Fig. 6. performance and the shadowed regions denoting the standard
All experiments were conducted according to a deviations at each position).
bootstrapping-like strategy, which is widely adopted in For all our experiments, the MATLAB programming lan-
biometric recognition experiments (e.g. [7]). Having n images guage was chosen, with the MathConvNet [25] toolbox used
available, the bootstrap randomly selects (without replacement to implement our CNN models. Also, a NVIDIA Titan X
in our case) 0.9n images, creating a sample composed by 90% GPU was used to speed-up the learning processes, with 12GB
of the available data. This sample is disjointly divided into memory and 3,072 CUDA cores.
PROENÇA et al.: DEEP-PRWIS: PERIOCULAR RECOGNITION WITHOUT THE IRIS AND SCLERA 893

Fig. 8. Comparison between the performance attained by the method proposed in this paper and three baselines that represent the state-of-the-art. Results
are given for the full UBIRIS.v2 and FRGC data sets, i.e., without disregarding any sample of these sets.

C. Data Augmentation: Performance Optimization negative responses, i.e., the genuine distribution has non-zero
For performance optimisation, one important point is the densities along the unit interval, which doesn’t happen for the
amount of artificial data required with respect to the original impostor scores, where non-residual densities appear exclu-
number of images, avoiding the unrealistic “as large as sively near the zero value. In practice, this yields one important
possible” paradigm. Having three types of data augmentation requirement for biometric systems to work in degraded data:
strategies (scale/translation transforms, color transforms and the residual probability of observing false-matches. In these
regions swapping), the goal here is to perceive the amounts cases, regardless the system’s sensitivity, it can be stated with
of data above which performance improvements are residual, full confidence that any reported match is genuine.
if any. To get that threshold, we augmented the data from According to these results, in all subsequent experiments
one to 64× (considering the original number of images per we kept the amount of augmented data as 32× the original
data set), and repeated the learning / performance evaluation data set and compared our algorithm’s performance to three
steps. As given in the left part of Fig. 7, in the case of the baseline strategies: the works due to Zhao and Kumar [27],
UBIRIS.v2 set, performance consistently improves up to the Tan and Kumar [24] and Proença [19]. These techniques
point where the augmented data is about 32× the original are summarized in Sec. II and were selected because they
samples (i.e., using approximately 350 000 artificial images), report the state-of-the-art performance ( [27] and [24]), use
above where improvements in performance decrease and start techniques that are similar to ours ( [27]) and were designed
to be residual. Regarding the FRGC set, the stabilization to work in similar conditions to our method ( [19]). However,
in performance was observed a slight earlier, i.e., when the note that the Zhao and Kumar [27]’s method was designed
amount of augmented data was 8× to 16× the number of to work in a more challenging scenario, corresponding to the
original images (corresponding to approximately 400 000 arti- open-world operating mode.
ficial images).
In terms of the typical scores generated by the CNNs, D. All vs. Periocular vs. Ocular CNNs
the right side of Fig. 7 plots the genuine/impostor scores As stated above, the underlying hypothesis in this paper is
likelihood densities for the UBIRIS.v2 (upper row) and FRGC that periocular recognition performance improves when the
sets. The zoomed-in region turns particularly evident the less reliable components (the iris and the sclera) are dis-
classifier bias, in which errors are most times due to false carded by the CNN. Fig. 9 compares the performance attained
894 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

Fig. 10. Comparison between the average magnitude of the 512 (9×12) CNN
filters learned immediately before the fully connected layers, i.e., the first point
in the CNN where the filters coefficients have a bijective correspondence to
input image regions (interpolated 45 × 60 grids are shown, for visualization
purposes). Here, the filter magnitude corresponds directly the relevancy of
the corresponding regions in the input data. Results regard the UBIRIS.v2 set
and are identical to the observed for the FRGC data (not included to avoid
redundancy).

i.e., the first point in the CNN where the filters coefficients
have a bijective correspondence to input image positions.
Results are given in Fig. 10 for three types of CNNs: in a) the
CNN learns from all the regions of the input data, i.e., without
using the image overlapping strategy described in Sec. III-B.1;
in b) only the ocular regions are considered by the CNN; and
in c) only the periocular regions are considered. It can be
seen that the average magnitude of the coefficients spreads
evenly in a) and has obvious valleys in the regions that are
implicitly demanded to be discarded, according to the data
augmentation strategy used. This confirms that the CNNs are
actually disregarding or, at least, giving less importance to the
information in these regions.

Fig. 9. Comparison between the recognition performance obtained by the E. State-of-the-Art Results Comparison
CNNs when using all the information available (All series, represented by blue
lines), when discarding the components inside the ocular globe (Periocular
The ROC curves and the Rank-N plots are given in Fig. 8,
series, represented by yellow lines), and when considering exclusively the for the four methods and the UBIRIS.v2 and FRGC sets.
components in the ocular globe (Ocular series, represented by red lines). In all cases, the proposed method1 solidly outperformed its
competitors, with solid differences in performance with respect
to any other strategy. The differences in performance are
when using all the image components (iris, sclera, eyelids, particularly evident for small levels of false acceptances, which
eyelashes, eyebrows and skin), and when the components is exactly the most valuable operating range for security
inside the ocular globe are implicitly discarded (according applications. Regarding the UBIRIS.v2, the proposed method
to the data augmentation strategy described in Sec. III-B.1). attained EERs around 1.9%, decreasing the state-of-the-art rate
As a reference, we also show the performance obtained by over 80%, and over 88% in terms of Rank-1 accuracy. Results
the complementary configuration (i.e., using only the iris and observed for the FRGC set were substantially better than those
the sclera), which is done simply by using the ID of the for UBIRIS.v2, which accords the previous research (e.g., [1])
ocular part in each augmented sample. As can be seen both and were justified by the lower number of degradation fac-
in the ROC and Rank-N curves, the best performance is tor in this set (essentially blur and poor resolution). Again,
attained when the ocular components are discarded, with solid the proposed method got the best performance among its
differences in performance and non-overlapping confidence competitors, with the true identity being reported at the first
intervals. The small reliability of the iris and sclera for bio- position (Rank-1) over 92% of the times. In all performance
metric recognition in visible-light environments is confirmed measurements, the differences with respect to the second best
by the performance attained by the Ocular classifier, with method (Zhao and Kumar [27]) were evident, particularly in
performance levels dramatically poorer than the other two the most important range of the performance space (FAR
configurations (All and Periocular). Results in Fig. 9 regard values less than 10−2 ). Table I summarizes the performance
exclusively the UBIRIS.v2 set, even though almost overlap- indicators observed in our experiments, for the four algorithms
ping differences in performance were observed for FRGC. and two data sets considered.
As these results are clearly redundant to those provided for
F. Improvements and Further Work
UBIRIS.v2, we decided not to include them in the paper.
As insight for further improvements, Fig. 11 illustrates the
Moreover, the different features learned by the CNNs when
samples where the proposed method obtained its worst results
using only some of the components are evident by analyzing
the average magnitude of the 512 (9 × 12) filters tuned by the 1 MATLAB source available at http://www.di.ubi.pt/ hugomcp/
SGD algorithm immediately before the fully connected layers, DeepPeriocular.zip
PROENÇA et al.: DEEP-PRWIS: PERIOCULAR RECOGNITION WITHOUT THE IRIS AND SCLERA 895

TABLE I With respect to the periocular biometrics domain, there are


C OMPARISON B ETWEEN THE P ERFORMANCE O BTAINED BY THE two important conclusions: 1) for visible-light data, perfor-
M ETHOD P ROPOSED IN T HIS PAPER W ITH R ESPECT TO
T HREE S TATE - OF - THE -A RT S TRATEGIES
mance improves when the information in the ocular globe
is disregarded, and the recogniser’s response is solely based
in the surrounding eye’s components; and 2) disregarding the
iris/sclera regions can be done without explicitly segmenting
these regions during the recognition step. As main result,
the proposed method advances the state-of-the-art performance
in the closed-world scenario for two of the most used data sets
in this field (UBIRIS.v2 and FRGC). It should be noted that
these results were observed when considering even the poorest
quality samples in both data sets, i.e., without disregarding any
image or using any friendly versions of the datasets.

ACKNOWLEDGEMENTS
The authors acknowledge the support of NVIDIA Corpora-
tion® , with the donation of one Titan X GPU.
This work was supported by PEst-OE/EEI/LA0008/2013
research program.

R EFERENCES
[1] F. Alonso-Fernandez and J. Bigun, “A survey on periocular biometrics
research,” Pattern Recognit. Lett., vol. 82, pp. 92–105, Oct. 2016.
Fig. 11. Examples of the UBIRIS.v2 images where the proposed method [2] S. Bharadwaj, H. S. Bhatt, M. Vatsa, and R. Singh, “Peri-
got its worst performance. Two major error sources were detected: 1) eyes ocular biometrics: When iris recognition fails,” in Proc. IEEE
misaligned with the image centers; and 2) cases where the skin and eyebrows Int. Conf. Biometrics, Theory, Appl. Syst., Sep. 2010, pp. 1–6,
are badly visible. doi: 10.1109/BTAS.2010.5634498.
[3] R. H. Byrd, M. E. Hribar, and J. Nocedal, “An interior point algorithm
for large-scale nonlinear programming,” SIAM J. Optim., vol. 9, no. 4,
in terms of the Rank-n positions (UBIRIS.v2). In most cases, pp. 877–900, 1999.
[4] Z. Cao and N. A. Schmid, “Fusion of operators for heterogeneous
failures were due to: 1) large differences in phase (when the periocular recognition at varying ranges,” Pattern Recognit. Lett., vol. 82,
eye centre is deviated from the image centre); and 2) cropped pp. 170–180, Oct. 2016.
eye regions that are too narrow, when the eyebrows and the [5] L. Chen and J. Ferryman, “Combining 3D and 2D for less constrained
periocular recognition,” in Proc. IEEE Int. Conf. Biometrics Theory,
skin are not available. In such cases, images contain almost Appl. Syst., Sep. 2015, pp. 1–6, doi: 10.1109/BTAS.2015.7358753.
exclusively the ocular regions, which - considering that our [6] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
method disregards such information - justifies its poor perfor- feedforward neural networks,” in Proc. Int. Conf. Artif. Intell. Stat., 2010,
pp. 249–256.
mance. These problems can be attenuated if more accurate eye [7] K. P. Hollingsworth, K. W. Bowyer, and P. J. Flynn, “Improved
detection modules are used, or by considering (in a way similar iris recognition through fusion of Hamming distance and fragile bit
to the work of Zhao and Kumar [27]) semantic information distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12,
pp. 2465–2476, Dec. 2011.
about the narrowness of the detected eyes, in which the [8] A. Joshi, A. Gangwar, R. Sharma, A. Singh, and Z. Saquib,
narrowest samples (containing almost exclusively the ocular “Periocular recognition based on Gabor and Parzen PNN,” in
part) can be classified by a CNN that also considers the ocular Proc. IEEE Int. Conf. Image Process., Oct. 2014, pp. 4977–4981,
doi: 10.1109/ICIP.2014.7026008.
components (corresponding to the All configuration results [9] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with
given in Sec. IV-D). Even though this network got worse deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process.
performance than its Periocular counterpart, the performance Syst. Conf., 2012, pp. 1097–1105.
[10] J. Long, E. Schelhamar, and T. Darrell, “Fully convolutional networks
in those narrowest samples was typically the best among all for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern
methods tested. Recognit., Jun. 2015, pp. 3431–3440.
[11] G. Mahalingam and K. Ricanek, Jr., “LBP-based periocular recognition
on challenging face datasets,” EURASIP J. Image Video Process., vol. 36,
V. C ONCLUSIONS pp. 1–13, Dec. 2013.
This paper describes a periocular recognition algorithm [12] L. Nie, A. Kumar, and S. Zhan, “Periocular recognition using unsu-
pervised convolutional RBM feature learning,” in Proc. 22nd Int. Conf.
for visible-light data that is based in convolution neural Pattern Recognit., 2014, pp. 399–404.
networks (CNNs). The novelty is that, by augmenting the [13] I. Nigam, M. Vatsa, and R. Singh, “Ocular biometrics: A survey of
learning data using multi-class artificial samples, it is possible modalities and fusion approaches,” Inf. Fusion, vol. 26, pp. 1–35,
Nov. 2015.
to implicitly transmit prior information to the network about [14] U. Park, A. Ross, and A. K. Jain, “Periocular biometrics in the visible
the regions in the input data that are not reliable for biometric spectrum: A feasibility study,” in Proc. 3rd IEEE Int. Conf. Biometrics,
recognition. Such conclusion, if left to be autonomously drew Theory, Appl. Syst., Sep. 2009, pp. 153–158.
[15] U. Park, R. Jillela, A. Ross, and A. K. Jain, “Periocular biometrics in
by the CNN would require additional amounts of learning data, the visible spectrum,” IEEE Trans. Inf. Forensics Security, vol. 6, no. 1,
which might not be available. pp. 96–106, Mar. 2011.
896 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

[16] P. J. Phillips, “Overview of the face recognition grand challenge,” in Hugo Proença received the B.Sc., M.Sc., and Ph.D.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., vol. 1. Jun. 2005, degrees in 2001, 2004, and 2007, respectively. He is
pp. 947–954. currently an Associate Professor with the Depart-
[17] H. Proenca, “Iris recognition: On the segmentation of degraded images ment of Computer Science, University of Beira
acquired in the visible wavelength,” IEEE Trans. Pattern Anal. Mach. Interior, where he is involved in researching mainly
Intell., vol. 32, no. 8, pp. 1502–1516, Aug. 2010. about biometrics and visual surveillance. He served
[18] H. Proença, S. Filipe, R. Santos, J. Oliveira, and L. A. Alexandre, as a Guest Editor of special issues of the Pattern
“The UBIRIS.v2: A database of visible wavelength iris images captured Recognition Letters, Image and Vision Computing,
on-the-move and at-a-distance,” IEEE Trans. Pattern Anal. Mach. Intell., and Signal, Image and Video Processing journals.
vol. 32, no. 8, pp. 1529–1535, Aug. 2010. He is the Coordinating Editor of the IEEE B IOMET-
[19] H. Proença, “Ocular biometrics by score-level fusion of disparate RICS C OUNCIL N EWSLETTER and the Area Edi-
experts,” IEEE Trans. Image Process., vol. 23, no. 12, pp. 5081–5093, tor (ocular biometrics) of the IEEE B IOMETRICS C OMPENDIUM J OURNAL.
Dec. 2014. He is a member of the Editorial Boards of the Image and Vision Computing
[20] R. Raghavendra, K. B. Raja, B. Yang, and C. Busch, “Combin- and the International Journal of Biometrics.
ing iris and periocular recognition using light field camera,” in
Proc. IAPR Asian Conf. Pattern Recognit., Nov. 2013, pp. 155–159,
doi: 10.1109/ACPR.2013.22.
[21] A. Ross et al., “Matching highly non-ideal ocular images: An infor-
mation fusion approach,” in Proc. 5th IAPR Int. Conf. Biometrics,
Mar./Apr. 2012, pp. 446–453, doi: 10.1109/ICB.2012.6199791.
[22] A. Sequeira et al., “Cross-eyed—Cross-spectral iris/periocular
recognition database and competition,” in Proc. Int. Conf.
Biometrics Special Interest Group (BIOSIG), Sep. 2016, pp. 1–5,
doi: 10.1109/BIOSIG.2016.7736915.
[23] C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for
object detection,” in Proc. Adv. Neural Inf. Process. Syst. Conf., 2013,
pp. 2553–2561.
[24] C.-W. Tan and A. Kumar, “Towards online iris and periocular recognition
under relaxed imaging constraints,” IEEE Trans. Image Process., vol. 22,
no. 10, pp. 3751–3765, Oct. 2013.
[25] A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks João C. Neves received the B.Sc. and M.Sc. degrees
for MATLAB,” in Proc. 23rd ACM Int. Conf. Multimedia, 2015, in computer science from the University of Beira
pp. 689–692. Interior, Portugal, in 2011 and 2013, respectively,
[26] D. L. Woodard, S. J. Pundlik, J. R. Lyle, and P. E. Miller, “Periocular where he is currently pursuing the Ph.D. degree
region appearance cues for biometric identification,” in Proc. IEEE Conf. in the area of biometrics. His research interests
Comput. Vis. Pattern Recognit. Workshops, Jun. 2010, pp. 162–169, broadly include computer vision and pattern recog-
doi: 10.1109/CVPRW.2010.5544621. nition, with a particular focus on biometrics and
[27] Z. Zhao and A. Kumar, “Accurate periocular recognition under surveillance.
less constrained environment using semantics-assisted convolutional
neural network,” IEEE Trans. Inf. Forensics Security, vol. 12, no. 5,
pp. 1017–1030, May 2016, doi: 10.1109/TIFS.2016.2636093.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy