Deep-PRWIS: Periocular Recognition Without The Iris and Sclera Using Deep Learning Frameworks
Deep-PRWIS: Periocular Recognition Without The Iris and Sclera Using Deep Learning Frameworks
4, APRIL 2018
Fig. 2. Schema of the strategy used to implicitly force the CNN to disregard regions in the input data. Creating artificial “multi-class” samples that keep as
label the ID of the periocular part, leads the network to consider that ocular patterns are meaningless for biometric recognition. This yields four properties (given
at the top-right corner), which will not be verified for any other combination of learning/testing strategies (given at the bottom part of the figure).
a binary mask B that discriminates between the ocular O Section III describes our method. In Section IV we discuss the
(iris and sclera) and the remaining components P (henceforth obtained results and the conclusions are given in Section V.
designated as periocular, including the eyebrows, eyelids,
eyelashes and skin) in each learning sample. Next, a set
II. R ELATED W ORK
of artificial samples is created, interchanging the ocular and
periocular parts from different subjects, but always consid- The pioneer work on periocular biometrics was due to
ering as label the ID provided by the periocular part. This Park et al. [14] (extended in [15]). They consider the iris
way, during the learning phase, the CNN receives, for each as the reference for defining the ROI, described by HoG,
periocular part, samples of different ocular classes, forcing LBP and SIFT descriptors. The 2 norm is the distance
it to conclude that such regions should not be considered in measure for each descriptor, with results fused at the score
its response (i.e., the ID). During the test phase (Test box), level, by linear combination. This work provided the basis
samples are provided to the network without any segmenta- for a large number of subsequent methods: Mahalingam and
tion mask, yielding four key properties: 1) the CNN testing Ricanek Jr. [11] apply multi-scale, patch-based LBP descrip-
performance is not conditioned by the effectiveness of the tors, using iris center for data alignment. Ross et al. [21] use
segmentation step, known to be a primary error source in HoGs to extract the global image information, SIFT to extract
computer vision tasks; 2) the CNN naturally ignores the ocular local edge anomalies, and probabilistic deformation models to
components, focusing in the most discriminating information; handle non-linear deformations, with the sum rule combining
3) the learning and test data have similar appearance, which the dissimilarity scores. Bharadwaj et al. [2] apply global
contributes to the CNN’s generalization capability; and 4) from descriptors (GIST and circular LBPs), each one compared
a data augmentation perspective, the set of artificial samples using the Chi-square distance. Scores are also linearly com-
provided to the network also improves the CNN performance. bined. Woordard et al. [26] fuse local appearance-based feature
As shown in the bottom part of Fig. 2, any other combination descriptors to 2D color histograms (red and green channels),
of learning/test data (using explicit region masking) will not compared using the city-block (LBP) and Bhattacharya (color
keep these four properties simultaneously. histograms) distances. Joshi et al. [8] describe the periocular
As outcome of this work, the resulting periocular recogniser information by mean of a bank of complex Gabor filters,
outperforms consistently the state-of-the-art, decreasing the while Tan and Kumar [24] evaluate the effectiveness of SIFT,
EERs and improving the Rank-1 values with respect to the GIST, LBP, HoG and Leung-Malik Filters texture descriptors
baseline methods. Note that these results were obtained in two to provide discriminating information on periocular data. The
widely known data sets and using the entire set of images in singularity of Nie et al. [12]’s work is to combine this kind
both sets, i.e., without disregarding even the poorest quality of classical approach to a convolutional restricted Boltzmann
samples. machine, which enables to obtain the probability distribu-
The remainder of this paper is organised as follows: tions in the periocular data, discriminated by metric learning
Section II summarises the periocular biometrics research, and and SVMs.
890 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018
Fig. 3. Structure of the convolutional neural network used in image classification. Six convolutional layers, three max-pooling and dropout layers are used
before the (three) fully connected and the soft-max layer, that estimates the sample identity. “s:_” denotes the stride, “p:_” specifies the padding, “w:_” is the
square neighborhood used in max-pooling layers and “r:_” defines the dropout rate. Note that all convolution layers also include "ReLU" non-linear transfer
functions.
Additional approaches are due to Chen and Ferryman [5], in Fig. 3. This classical architecture boosted the popularity of
which fuse 2D to 3D data, masking out the ocu- deep learning frameworks for image classification, and is know
lar region from the encoding and comparison process. to constitute a good trade-off between the number of model
Raghavendra et al. [20] exploit the light-field data acquisition parameters and the generalisation capabilities of the final
technology to produce sharp images for iris and periocular solution. The idea is to start by extracting features of increas-
recognizers, with scores linearly combined. Aiming at cross- ing complexity at the deeper layers of the network (using
spectral recognition, Cao and Schmid [4] convolve the perioc- convolution layers), which feed the final fully connected layers
ular region with a bank of Gabor filters, from which phase and that provide the final response. At the same time, max pooling
magnitude components are described by HoGs and histograms and dropout layers keep the number of parameters relatively
of LBPs descriptors. Features are concatenated and com- low while not compromising the generalization capabilities of
pared using the I-divergence measure. As an anti-counterfeit the network. Our input data are 150 × 200 × 3 RGB images
measure, Proença [19] propose an ensemble made of two that pass through convolution (at first), max-pooling, dropout
disparate experts: one analysing the iris texture and the other and fully connected layers. All the convolutional layers are
one parameterizing the shape of eyelids and analysing the adjacent to Rectified Linear Unit (ReLU) activation functions,
surrounding skin. Both experts provide independent responses being the it h output channel y(i) given by:
and do not share particularly sensitivity to any image covariate.
k
In terms of deep learning-based approaches,
y(i) = max b(i j ) + w(i j ) ∗ x( j ), 0 p , (1)
Zhao and Kumar [27] use a CNN for periocular recognition
j =1
(as we do). The novelty is to consider explicit semantic
information to extract more comprehensive periocular where max(., 0 p ) is the component-wise maximum operator,
features, helping the CNN to improve performance. Refer 0 p = [0, . . . , 0]T is an p × 1 vector with all elements equal
to the surveys on periocular biometrics due to Alonso- to zero, b() and w() are the bias and weight terms tuned
Fernandez and Bigun [1] and Nigam et al. [13], for additional during the learning phase and x represents the layer inputs.
information about the periocular biometrics research. The max-pooling layers operate independently in each depth
Recently, particular attention has been paid to the recogni- slice of the input and take the maximum value over square
tion of cross-spectral iris/periocular data, i.e., when the pairs patches. Finally, dropout layers set to zero the output of each
of images to be compared were acquired using different light neuron during the learning step with probability r , avoiding
wavelengths (typically near infra-red and visible). Several that they contribute to the forward pass and participate in
approaches were published in this field, with some results and back-propagation.
relevant methods described in [22]. In our model, the first convolutional layer has 128 ker-
nels (5 × 5), using stride and padding of two pixels. Next,
III. P ROPOSED M ETHOD a max-pooling and a dropout layer feed the second and third
convolutional layers composed of 256 kernels (5 × 5, two
A. Deep Learning Architecture pixels of stride and padding). Again, a max-pooling shrinks
We use one of the most popular deep learning archi- the volume data and then two convolutional layers with output
tectures for image classification: Convolution Neural Net- size equal to the input are applied (256 kernels of size 3 × 3,
works (CNNs), which are a biologically inspired variant of stride and padding equal to one). Before the fully connected
multilayer perceptron networks (MLPs) particularly suitable layers, data pass through a convolution layer (with 512 kernels
for image classification. By making some assumptions about of size 3 × 3, stride and padding equal to one) a max pooling
the nature of the input data (e.g., stationarity of statistics and and a dropout layer, yielding 9 × 12 × 512 = 55, 296 features
locality of pixel dependencies), CNNs have much fewer con- entering in the fully connected layers. Another dropout layer
nections than MLPs, making learning a feasible task. In partic- is used before the soft-max layer, that produces a vector of
ular, we adopt a CNN architecture based in AlexNet [9], shown c positive elements corresponding to the probability for each
PROENÇA et al.: DEEP-PRWIS: PERIOCULAR RECOGNITION WITHOUT THE IRIS AND SCLERA 891
class label:
T
ex w j
P(y = j |x) = T . (2)
x wk
ke
According to the output of the soft-max layer, the label
prediction is the class with the highest probability among the
c possibilities: ŷ = arg max j P(y = j |x). The CNNs were
trained using the stochastic gradient descent (SGD) algorithm,
with a batch size of 256 samples. As preprocessing step,
the mean of the learning data was subtracted from all samples.
The learning rate was 1e−3 , with a momentum of 0.9 and
a weight decay of 5e−4 . The number of iterations in each
experiment was set to 100. All weights in the CNN were
initialised according to Glorot and Bengio’s [6] method.
B. Data Augmentation
1) Ocular/Periocular Regions Swapping: Let Ii and I j be
150 × 200 × 3 RGB images from two different subjects. Using
the segmentation method described in [17], we obtain two Fig. 4. Left column: UBIRIS.v2 samples. At right: Artificial “multi-class”
binary masks Bi and B j (150 × 200 pixels) that discriminate samples composed of the periocular region given at left and the ocular parts
given below each image. Note that the periocular region in these “multi-class”’
between the ocular (iris and sclera) and the periocular compo- samples in each row is the same.
nents in I. . Let O. and P. denote the ocular and periocular parts
of I. . The goal is to create an artificial sample Pi O j composed
of the periocular region of Ii overlapping the ocular part of I j ,
which requires to find the scale and translation parameters,
such that O j optimally fits the ocular whole of Pi . Let b. be the
n × 1 vectorized version of B. (n = 30 000). The convolution
“*” between bi and b j is given in matrix form by:
bi ∗ b j = T(bi ) b j , (3)
being T(bi ) the Toeplitz matrix of bi :
⎡ ⎤⎫
bi 0 . . . 0 ⎪ ⎪
⎢ .. ⎥⎪
⎪
⎬
⎢ 0 bi . . . .⎥
T(bi ) = ⎢⎢. .. ..
⎥ (2n − 1) × n.
⎥⎪ (4) Fig. 5. Examples of the scale, translation and color transforms used. The
⎣ .. . . 0 ⎦⎪ ⎪
upper row illustrates the randomly cropped patches, and the bottom row shows
⎪
⎭ changes in color, obtained by adding multiples of the principal component
0 0 0 bi vectors to each image pixel.
Fig. 8. Comparison between the performance attained by the method proposed in this paper and three baselines that represent the state-of-the-art. Results
are given for the full UBIRIS.v2 and FRGC data sets, i.e., without disregarding any sample of these sets.
C. Data Augmentation: Performance Optimization negative responses, i.e., the genuine distribution has non-zero
For performance optimisation, one important point is the densities along the unit interval, which doesn’t happen for the
amount of artificial data required with respect to the original impostor scores, where non-residual densities appear exclu-
number of images, avoiding the unrealistic “as large as sively near the zero value. In practice, this yields one important
possible” paradigm. Having three types of data augmentation requirement for biometric systems to work in degraded data:
strategies (scale/translation transforms, color transforms and the residual probability of observing false-matches. In these
regions swapping), the goal here is to perceive the amounts cases, regardless the system’s sensitivity, it can be stated with
of data above which performance improvements are residual, full confidence that any reported match is genuine.
if any. To get that threshold, we augmented the data from According to these results, in all subsequent experiments
one to 64× (considering the original number of images per we kept the amount of augmented data as 32× the original
data set), and repeated the learning / performance evaluation data set and compared our algorithm’s performance to three
steps. As given in the left part of Fig. 7, in the case of the baseline strategies: the works due to Zhao and Kumar [27],
UBIRIS.v2 set, performance consistently improves up to the Tan and Kumar [24] and Proença [19]. These techniques
point where the augmented data is about 32× the original are summarized in Sec. II and were selected because they
samples (i.e., using approximately 350 000 artificial images), report the state-of-the-art performance ( [27] and [24]), use
above where improvements in performance decrease and start techniques that are similar to ours ( [27]) and were designed
to be residual. Regarding the FRGC set, the stabilization to work in similar conditions to our method ( [19]). However,
in performance was observed a slight earlier, i.e., when the note that the Zhao and Kumar [27]’s method was designed
amount of augmented data was 8× to 16× the number of to work in a more challenging scenario, corresponding to the
original images (corresponding to approximately 400 000 arti- open-world operating mode.
ficial images).
In terms of the typical scores generated by the CNNs, D. All vs. Periocular vs. Ocular CNNs
the right side of Fig. 7 plots the genuine/impostor scores As stated above, the underlying hypothesis in this paper is
likelihood densities for the UBIRIS.v2 (upper row) and FRGC that periocular recognition performance improves when the
sets. The zoomed-in region turns particularly evident the less reliable components (the iris and the sclera) are dis-
classifier bias, in which errors are most times due to false carded by the CNN. Fig. 9 compares the performance attained
894 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018
Fig. 10. Comparison between the average magnitude of the 512 (9×12) CNN
filters learned immediately before the fully connected layers, i.e., the first point
in the CNN where the filters coefficients have a bijective correspondence to
input image regions (interpolated 45 × 60 grids are shown, for visualization
purposes). Here, the filter magnitude corresponds directly the relevancy of
the corresponding regions in the input data. Results regard the UBIRIS.v2 set
and are identical to the observed for the FRGC data (not included to avoid
redundancy).
i.e., the first point in the CNN where the filters coefficients
have a bijective correspondence to input image positions.
Results are given in Fig. 10 for three types of CNNs: in a) the
CNN learns from all the regions of the input data, i.e., without
using the image overlapping strategy described in Sec. III-B.1;
in b) only the ocular regions are considered by the CNN; and
in c) only the periocular regions are considered. It can be
seen that the average magnitude of the coefficients spreads
evenly in a) and has obvious valleys in the regions that are
implicitly demanded to be discarded, according to the data
augmentation strategy used. This confirms that the CNNs are
actually disregarding or, at least, giving less importance to the
information in these regions.
Fig. 9. Comparison between the recognition performance obtained by the E. State-of-the-Art Results Comparison
CNNs when using all the information available (All series, represented by blue
lines), when discarding the components inside the ocular globe (Periocular
The ROC curves and the Rank-N plots are given in Fig. 8,
series, represented by yellow lines), and when considering exclusively the for the four methods and the UBIRIS.v2 and FRGC sets.
components in the ocular globe (Ocular series, represented by red lines). In all cases, the proposed method1 solidly outperformed its
competitors, with solid differences in performance with respect
to any other strategy. The differences in performance are
when using all the image components (iris, sclera, eyelids, particularly evident for small levels of false acceptances, which
eyelashes, eyebrows and skin), and when the components is exactly the most valuable operating range for security
inside the ocular globe are implicitly discarded (according applications. Regarding the UBIRIS.v2, the proposed method
to the data augmentation strategy described in Sec. III-B.1). attained EERs around 1.9%, decreasing the state-of-the-art rate
As a reference, we also show the performance obtained by over 80%, and over 88% in terms of Rank-1 accuracy. Results
the complementary configuration (i.e., using only the iris and observed for the FRGC set were substantially better than those
the sclera), which is done simply by using the ID of the for UBIRIS.v2, which accords the previous research (e.g., [1])
ocular part in each augmented sample. As can be seen both and were justified by the lower number of degradation fac-
in the ROC and Rank-N curves, the best performance is tor in this set (essentially blur and poor resolution). Again,
attained when the ocular components are discarded, with solid the proposed method got the best performance among its
differences in performance and non-overlapping confidence competitors, with the true identity being reported at the first
intervals. The small reliability of the iris and sclera for bio- position (Rank-1) over 92% of the times. In all performance
metric recognition in visible-light environments is confirmed measurements, the differences with respect to the second best
by the performance attained by the Ocular classifier, with method (Zhao and Kumar [27]) were evident, particularly in
performance levels dramatically poorer than the other two the most important range of the performance space (FAR
configurations (All and Periocular). Results in Fig. 9 regard values less than 10−2 ). Table I summarizes the performance
exclusively the UBIRIS.v2 set, even though almost overlap- indicators observed in our experiments, for the four algorithms
ping differences in performance were observed for FRGC. and two data sets considered.
As these results are clearly redundant to those provided for
F. Improvements and Further Work
UBIRIS.v2, we decided not to include them in the paper.
As insight for further improvements, Fig. 11 illustrates the
Moreover, the different features learned by the CNNs when
samples where the proposed method obtained its worst results
using only some of the components are evident by analyzing
the average magnitude of the 512 (9 × 12) filters tuned by the 1 MATLAB source available at http://www.di.ubi.pt/ hugomcp/
SGD algorithm immediately before the fully connected layers, DeepPeriocular.zip
PROENÇA et al.: DEEP-PRWIS: PERIOCULAR RECOGNITION WITHOUT THE IRIS AND SCLERA 895
ACKNOWLEDGEMENTS
The authors acknowledge the support of NVIDIA Corpora-
tion® , with the donation of one Titan X GPU.
This work was supported by PEst-OE/EEI/LA0008/2013
research program.
R EFERENCES
[1] F. Alonso-Fernandez and J. Bigun, “A survey on periocular biometrics
research,” Pattern Recognit. Lett., vol. 82, pp. 92–105, Oct. 2016.
Fig. 11. Examples of the UBIRIS.v2 images where the proposed method [2] S. Bharadwaj, H. S. Bhatt, M. Vatsa, and R. Singh, “Peri-
got its worst performance. Two major error sources were detected: 1) eyes ocular biometrics: When iris recognition fails,” in Proc. IEEE
misaligned with the image centers; and 2) cases where the skin and eyebrows Int. Conf. Biometrics, Theory, Appl. Syst., Sep. 2010, pp. 1–6,
are badly visible. doi: 10.1109/BTAS.2010.5634498.
[3] R. H. Byrd, M. E. Hribar, and J. Nocedal, “An interior point algorithm
for large-scale nonlinear programming,” SIAM J. Optim., vol. 9, no. 4,
in terms of the Rank-n positions (UBIRIS.v2). In most cases, pp. 877–900, 1999.
[4] Z. Cao and N. A. Schmid, “Fusion of operators for heterogeneous
failures were due to: 1) large differences in phase (when the periocular recognition at varying ranges,” Pattern Recognit. Lett., vol. 82,
eye centre is deviated from the image centre); and 2) cropped pp. 170–180, Oct. 2016.
eye regions that are too narrow, when the eyebrows and the [5] L. Chen and J. Ferryman, “Combining 3D and 2D for less constrained
periocular recognition,” in Proc. IEEE Int. Conf. Biometrics Theory,
skin are not available. In such cases, images contain almost Appl. Syst., Sep. 2015, pp. 1–6, doi: 10.1109/BTAS.2015.7358753.
exclusively the ocular regions, which - considering that our [6] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
method disregards such information - justifies its poor perfor- feedforward neural networks,” in Proc. Int. Conf. Artif. Intell. Stat., 2010,
pp. 249–256.
mance. These problems can be attenuated if more accurate eye [7] K. P. Hollingsworth, K. W. Bowyer, and P. J. Flynn, “Improved
detection modules are used, or by considering (in a way similar iris recognition through fusion of Hamming distance and fragile bit
to the work of Zhao and Kumar [27]) semantic information distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12,
pp. 2465–2476, Dec. 2011.
about the narrowness of the detected eyes, in which the [8] A. Joshi, A. Gangwar, R. Sharma, A. Singh, and Z. Saquib,
narrowest samples (containing almost exclusively the ocular “Periocular recognition based on Gabor and Parzen PNN,” in
part) can be classified by a CNN that also considers the ocular Proc. IEEE Int. Conf. Image Process., Oct. 2014, pp. 4977–4981,
doi: 10.1109/ICIP.2014.7026008.
components (corresponding to the All configuration results [9] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with
given in Sec. IV-D). Even though this network got worse deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process.
performance than its Periocular counterpart, the performance Syst. Conf., 2012, pp. 1097–1105.
[10] J. Long, E. Schelhamar, and T. Darrell, “Fully convolutional networks
in those narrowest samples was typically the best among all for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern
methods tested. Recognit., Jun. 2015, pp. 3431–3440.
[11] G. Mahalingam and K. Ricanek, Jr., “LBP-based periocular recognition
on challenging face datasets,” EURASIP J. Image Video Process., vol. 36,
V. C ONCLUSIONS pp. 1–13, Dec. 2013.
This paper describes a periocular recognition algorithm [12] L. Nie, A. Kumar, and S. Zhan, “Periocular recognition using unsu-
pervised convolutional RBM feature learning,” in Proc. 22nd Int. Conf.
for visible-light data that is based in convolution neural Pattern Recognit., 2014, pp. 399–404.
networks (CNNs). The novelty is that, by augmenting the [13] I. Nigam, M. Vatsa, and R. Singh, “Ocular biometrics: A survey of
learning data using multi-class artificial samples, it is possible modalities and fusion approaches,” Inf. Fusion, vol. 26, pp. 1–35,
Nov. 2015.
to implicitly transmit prior information to the network about [14] U. Park, A. Ross, and A. K. Jain, “Periocular biometrics in the visible
the regions in the input data that are not reliable for biometric spectrum: A feasibility study,” in Proc. 3rd IEEE Int. Conf. Biometrics,
recognition. Such conclusion, if left to be autonomously drew Theory, Appl. Syst., Sep. 2009, pp. 153–158.
[15] U. Park, R. Jillela, A. Ross, and A. K. Jain, “Periocular biometrics in
by the CNN would require additional amounts of learning data, the visible spectrum,” IEEE Trans. Inf. Forensics Security, vol. 6, no. 1,
which might not be available. pp. 96–106, Mar. 2011.
896 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018
[16] P. J. Phillips, “Overview of the face recognition grand challenge,” in Hugo Proença received the B.Sc., M.Sc., and Ph.D.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., vol. 1. Jun. 2005, degrees in 2001, 2004, and 2007, respectively. He is
pp. 947–954. currently an Associate Professor with the Depart-
[17] H. Proenca, “Iris recognition: On the segmentation of degraded images ment of Computer Science, University of Beira
acquired in the visible wavelength,” IEEE Trans. Pattern Anal. Mach. Interior, where he is involved in researching mainly
Intell., vol. 32, no. 8, pp. 1502–1516, Aug. 2010. about biometrics and visual surveillance. He served
[18] H. Proença, S. Filipe, R. Santos, J. Oliveira, and L. A. Alexandre, as a Guest Editor of special issues of the Pattern
“The UBIRIS.v2: A database of visible wavelength iris images captured Recognition Letters, Image and Vision Computing,
on-the-move and at-a-distance,” IEEE Trans. Pattern Anal. Mach. Intell., and Signal, Image and Video Processing journals.
vol. 32, no. 8, pp. 1529–1535, Aug. 2010. He is the Coordinating Editor of the IEEE B IOMET-
[19] H. Proença, “Ocular biometrics by score-level fusion of disparate RICS C OUNCIL N EWSLETTER and the Area Edi-
experts,” IEEE Trans. Image Process., vol. 23, no. 12, pp. 5081–5093, tor (ocular biometrics) of the IEEE B IOMETRICS C OMPENDIUM J OURNAL.
Dec. 2014. He is a member of the Editorial Boards of the Image and Vision Computing
[20] R. Raghavendra, K. B. Raja, B. Yang, and C. Busch, “Combin- and the International Journal of Biometrics.
ing iris and periocular recognition using light field camera,” in
Proc. IAPR Asian Conf. Pattern Recognit., Nov. 2013, pp. 155–159,
doi: 10.1109/ACPR.2013.22.
[21] A. Ross et al., “Matching highly non-ideal ocular images: An infor-
mation fusion approach,” in Proc. 5th IAPR Int. Conf. Biometrics,
Mar./Apr. 2012, pp. 446–453, doi: 10.1109/ICB.2012.6199791.
[22] A. Sequeira et al., “Cross-eyed—Cross-spectral iris/periocular
recognition database and competition,” in Proc. Int. Conf.
Biometrics Special Interest Group (BIOSIG), Sep. 2016, pp. 1–5,
doi: 10.1109/BIOSIG.2016.7736915.
[23] C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for
object detection,” in Proc. Adv. Neural Inf. Process. Syst. Conf., 2013,
pp. 2553–2561.
[24] C.-W. Tan and A. Kumar, “Towards online iris and periocular recognition
under relaxed imaging constraints,” IEEE Trans. Image Process., vol. 22,
no. 10, pp. 3751–3765, Oct. 2013.
[25] A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks João C. Neves received the B.Sc. and M.Sc. degrees
for MATLAB,” in Proc. 23rd ACM Int. Conf. Multimedia, 2015, in computer science from the University of Beira
pp. 689–692. Interior, Portugal, in 2011 and 2013, respectively,
[26] D. L. Woodard, S. J. Pundlik, J. R. Lyle, and P. E. Miller, “Periocular where he is currently pursuing the Ph.D. degree
region appearance cues for biometric identification,” in Proc. IEEE Conf. in the area of biometrics. His research interests
Comput. Vis. Pattern Recognit. Workshops, Jun. 2010, pp. 162–169, broadly include computer vision and pattern recog-
doi: 10.1109/CVPRW.2010.5544621. nition, with a particular focus on biometrics and
[27] Z. Zhao and A. Kumar, “Accurate periocular recognition under surveillance.
less constrained environment using semantics-assisted convolutional
neural network,” IEEE Trans. Inf. Forensics Security, vol. 12, no. 5,
pp. 1017–1030, May 2016, doi: 10.1109/TIFS.2016.2636093.