Publications

D-ESRGAN: A Dual-Encoder GAN with Residual CNN and Vision Transformer for Iris Image Super-Resolution

D-ESRGAN: A Dual-Encoder GAN with Residual CNN and Vision Transformer for Iris Image Super-Resolution

Iris images captured in less-constrained environments, especially at long distances often suffer from the interference of low resolution, resulting in the loss of much valid iris texture information for iris recognition. In this paper, we propose a dual-encoder super-resolution generative adversarial network (D-ESRGAN) for compensating texture lost of the raw image meanwhile maintaining the newly generated textures more natural. Specifically, the proposed D-ESRGAN not only integrates the residual CNN encoder to extract local features, but also employs an emerging vision transformer encoder to capture global associative information. The local and global features from two encoders are further fused for the subsequent reconstruction of high-resolution features. During the training, we develop a three-stage strategy to alleviate the problem that generative adversarial networks are prone to collapse. Moreover, to boost the iris recognition performance, we introduce a triplet loss to push away the distance of super-resolved iris images with different IDs, and pull the distance of super-resolved iris images with the same ID much closer. Experimental results on the public CASIA-Iris-distance and CASIA-Iris-M1 datasets show that D-ESRGAN archives better performance than state-of-the-art baselines in terms of both super-resolution image quality metrics and iris recognition metric.

AIF-LFNet: All-in-Focus Light Field Super-Resolution Method Considering the Depth-Varying Defocus

AIF-LFNet: All-in-Focus Light Field Super-Resolution Method Considering the Depth-Varying Defocus

As an aperture-divided computational imaging system, microlens array (MLA) -based light field (LF) imaging is playing an increasingly important role in computer vision. As the trade-off between the spatial and angular resolutions, deep learning (DL) -based image super-resolution (SR) methods have been applied to enhance the spatial resolution. However, in existing DL-based methods, the depth-varying defocus is not considered both in dataset development and algorithm design, which restricts many applications such as depth estimation and object recognition. To overcome this shortcoming, a super-resolution task that reconstructs all-in-focus high-resolution (HR) LF images from low-resolution (LR) LF images is proposed by designing a large dataset and proposing a convolutional neural network (CNN) -based SR method. The dataset is constructed by using Blender software, consisting of 150 light field images used as training data, and 15 light field images used as validation and testing data. The proposed network is designed by proposing the dilated deformable convolutional network (DCN) -based feature extraction block and the LF subaperture image (SAI) Deblur-SR block. The experimental results demonstrate that the proposed method achieves more appealing results both quantitatively and qualitatively.

Contextual Measures for Iris Recognition

Contextual Measures for Iris Recognition

The iris patterns of the human contain a large amount of randomly distributed and irregularly shaped microstructures. These microstructures make the human iris informative biometric traits. To learn identity representation from them, this paper regards each iris region as a potential microstructure and proposes contextual measures (CM) to model the correlations between them. CM adopts two parallel branches to learn global and local contexts in iris image. The first one is the globally contextual measure branch. It measures the global context involving the relationships between all regions for feature aggregation and is robust to local occlusions. Besides, we improve its spatial perception considering the positional randomness of the microstructures. The other one is the locally contextual measure branch. This branch considers the role of local details in the phenotypic distinctiveness of iris patterns and learns a series of relationship atoms to capture contextual information from a local perspective. In addition, we develop the perturbation bottleneck to make sure that the two branches learn divergent contexts. It introduces perturbation to limit the information flow from input images to identity features, forcing CM to learn discriminative contextual information for iris recognition. Experimental results suggest that global and local contexts are two different clues critical for accurate iris recognition. The superior performance on four benchmark iris datasets demonstrates the effectiveness of the proposed approach in within-database and cross-database scenarios.

An Empirical Comparative Analysis of Africans with Asians Using DCNN Facial Biometric Models

An Empirical Comparative Analysis of Africans with Asians Using DCNN Facial Biometric Models

Recently, the problem of racial bias in facial biometric systems has generated considerable attention from the media and biometric community. Many investigative studies have been published on estimating the bias between Caucasians and Asians, Caucasians and Africans, and other racial comparisons. These studies have reported inferior performances of both Asians and Africans when compared to other races. However, very few studies have highlighted the comparative differences in performance as a function of race between Africans and Asians. More so, those previous studies were mainly concentrated on a single aspect of facial biometrics and were usually conducted with images potentially captured with multiple camera sensors, thereby compounding their findings. This paper presents a comparative racial bias study of Asians with Africans on various facial biometric tasks. The images used were captured with the same camera sensor and under controlled conditions. We examine the performances of many DCNN-based models on face detection, facial landmark detection, quality assessment, verification, and identification. The results suggested higher performance on the Asians compared to the Africans by most algorithms under the same imaging and testing conditions.

Exploring Bias in Sclera Segmentation Models: A Group Evaluation Approach

Exploring Bias in Sclera Segmentation Models: A Group Evaluation Approach

Bias and fairness of biometric algorithms have been key topics of research in recent years, mainly due to the societal, legal and ethical implications of potentially unfair decisions made by automated decision-making models. A considerable amount of work has been done on this topic across different biometric modalities, aiming at better understanding the main sources of algorithmic bias or devising mitigation measures. In this work, we contribute to these efforts and present the first study investigating bias and fairness of sclera segmentation models. Although sclera segmentation techniques represent a key component of sclera-based biometric systems with a considerable impact on the overall recognition performance, the presence of different types of biases in sclera segmentation methods is still underexplored. To address this limitation, we describe the results of a group evaluation effort (involving seven research groups), organized to explore the performance of recent sclera segmentation models within a common experimental framework and study performance differences (and bias), originating from various demographic as well as environmental factors. Using five diverse datasets, we analyze seven independently developed sclera segmentation models in different experimental configurations. The results of our experiments suggest that there are significant differences in the overall segmentation performance across the seven models and that among the considered factors, ethnicity appears to be the biggest cause of bias. Additionally, we observe that training with representative and balanced data does not necessarily lead to less biased results. Finally, we find that in general there appears to be a negative correlation between the amount of bias observed (due to eye color, ethnicity and acquisition device) and the overall segmentation performance, suggesting that advances in the field of semantic segmentation may also help with mitigating bias.

PDVN: A Patch-based Dual-view Network for Face Liveness Detection using Light Field Focal Stack

PDVN: A Patch-based Dual-view Network for Face Liveness Detection using Light Field Focal Stack

Light Field Focal Stack (LFFS) can be efficiently rendered from a light field (LF) image captured by plenoptic cameras. Differences in the 3D surface and texture of biometric samples are internally reflected in the defocus blur and local patterns between the rendered slices of LFFS. This unique property makes LFFS quite appropriate to differentiate presentation attack instruments (PAIs) from bona fide samples. A patch-based dual-view network (PDVN) is proposed in this paper to leverage the merits of LFFS for face presentation attack detection (PAD). First, original LFFS data are divided into various local patches along spatial dimensions, which distracts the model from learning the useless facial semantics and greatly relieve the problem of insufficient samples. The strategy of dual-view branches is innovatively proposed, wherein the original view and microscopic view can simultaneously contribute to liveness detection. Separable 3D convolution on the focal dimension is verified to be more effective than vanilla 3D convolution for extracting discriminative features from LFFS data. The voting mechanism on predictions of patch LFFS samples further strengthens the robustness of the proposed framework. PDVN is compared with other face PAD methods on IST LLFFSD dataset and achieves perfect performance, i.e., ACER drops to 0.

Combining 2D texture and 3D geometry features for Reliable iris presentation attack detection using light field focal stack

Combining 2D texture and 3D geometry features for Reliable iris presentation attack detection using light field focal stack

Iris presentation attack detection (PAD) is still an unsolved problem mainly due to the various spoof attack strategies and poor generalisation on unseen attackers. In this paper, the merits of both light field (LF) imaging and deep learning (DL) are leveraged to combine 2D texture and 3D geometry features for iris liveness detection. By exploring off-the-shelf deep features of planar-oriented and sequence-oriented deep neural networks (DNNs) on the rendered focal stack, the proposed framework excavates the differences in 3D geometric structure and 2D spatial texture between bona fide and spoofing irises captured by LF cameras. A group of pre-trained DL models are adopted as feature extractor and the parameters of SVM classifiers are optimised on a limited number of samples. Moreover, two-branch feature fusion further strengthens the framework’s robustness and reliability against severe motion blur, noise, and other degradation factors. The results of comparative experiments indicate that variants of the proposed framework significantly surpass the PAD methods that take 2D planar images or LF focal stack as input, even recent state-of-the-art (SOTA) methods fined-tuned on the adopted database. Presentation attacks, including printed papers, printed photos, and electronic displays, can be accurately detected without fine-tuning a bulky CNN. In addition, ablation studies validate the effectiveness of fusing geometric structure and spatial texture features. The results of multi-class attack detection experiments also verify the good generalisation ability of the proposed framework on unseen presentation attacks.

Perturbation Inactivation Based Adversarial Defense for Face Recognition

Perturbation Inactivation Based Adversarial Defense for Face Recognition

Deep learning-based face recognition models are vulnerable to adversarial attacks. To curb these attacks, most defense methods aim to improve the robustness of recognition models against adversarial perturbations. However, the generalization capacities of these methods are quite limited. In practice, they are still vulnerable to unseen adversarial attacks. Deep learning models are fairly robust to general perturbations, such as Gaussian noises. A straightforward approach is to inactivate the adversarial perturbations so that they can be easily handled as general perturbations. In this paper, a plug-and-play adversarial defense method, named perturbation inactivation (PIN), is proposed to inactivate adversarial perturbations for adversarial defense. We discover that the perturbations in different subspaces have different influences on the recognition model. There should be a subspace, called the immune space, in which the perturbations have fewer adverse impacts on the recognition model than in other subspaces. Hence, our method estimates the immune space and inactivates the adversarial perturbations by restricting them to this subspace. The proposed method can be generalized to unseen adversarial perturbations since it does not rely on a specific kind of adversarial attack method. This approach not only outperforms several state-of-the-art adversarial defense methods but also demonstrates a superior generalization capacity through exhaustive experiments. Moreover, the proposed method can be successfully applied to four commercial APIs without additional training, indicating that it can be easily generalized to existing face recognition systems.

FedIris: Towards More Accurate and Privacy-Preserving Iris Recognition via Federated Template Communication

FedIris: Towards More Accurate and Privacy-Preserving Iris Recognition via Federated Template Communication

As biometric data undergo rapidly growing privacy concerns, building large-scale datasets has become more difficult. Unfortunately, current iris databases are mostly in small scale, e.g., thousands of iris images from hundreds of identities. What’s worse, the heterogeneity among decentralized iris datasets hinders the current deep learning (DL) frameworks from obtaining recognition performance with robust generalization. It motivates us to leverage the merits of federated learning (FL) to solve these problems. However, traditional FL algorithms often employ model sharing for knowledge transfer, wherein the simple averaging aggregation lacks interpretability, and divergent optimization directions of clients lead to performance degradation. To overcome this interference, we propose FedIris with solid theoretical foundations, which attempts to employ the iris template as the communication carrier and formulate federated triplet (Fed-Triplet) for knowledge transfer. Furthermore, the massive heterogeneity among iris datasets may induce negative transfer and unstable optimization. The modified Wasserstein distance is embedded into the FedTriplet loss to reweight global aggregation, which drives the clients with similar data distributions to contribute more mutually. Extensive experimental results demonstrate that the proposed FedIris outperforms SOLO training, model-sharing-based FL training, and even centralized training.

Towards Interpretable Defense Against Adversarial Attacks via Causal Inference

Towards Interpretable Defense Against Adversarial Attacks via Causal Inference

Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.

Contextual Measures for Iris Recognition

Contextual Measures for Iris Recognition

The uncontrollable acquisition process limits the performance of iris recognition. In the acquisition process, various inevitable factors, including eyes, devices, and environment, hinder the iris recognition system from learning a discriminative identity representation. This leads to severe performance degradation. In this paper, we explore uncertain acquisition factors and propose uncertainty embedding (UE) and uncertainty-guided curriculum learning (UGCL) to mitigate the influence of acquisition factors. UE represents an iris image using a probabilistic distribution rather than a deterministic point (binary template or feature vector) that is widely adopted in iris recognition methods. Specifically, UE learns identity and uncertainty features from the input image, and encodes them as two independent components of the distribution, mean and variance. Based on this representation, an input image can be regarded as an instantiated feature sampled from the UE, and we can also generate various virtual features through sampling. UGCL is constructed by imitating the progressive learning process of newborns. Particularly, it selects virtual features to train the model in an easy-to-hard order at different training stages according to their uncertainty. In addition, an instance-level enhancement method is developed by utilizing local and global statistics to mitigate the data uncertainty from image noise and acquisition conditions in the pixel-level space. The experimental results on six benchmark iris datasets verify the effectiveness and generalization ability of the proposed method on same-sensor and cross-sensor recognition.

Multitask deep active contour-based iris segmentation for off-angle iris images

Multitask deep active contour-based iris segmentation for off-angle iris images

Iris recognition has been considered as a secure and reliable biometric technology. However, iris images are prone to off-angle or are partially occluded when captured with fewer user cooperations. As a consequence, iris recognition especially iris segmentation suffers a serious performance drop. To solve this problem, we propose a multitask deep active contour model for off-angle iris image segmentation. Specifically, the proposed approach combines the coarse and fine localization results. The coarse localization detects the approximate position of the iris area and further initializes the iris contours through a series of robust preprocessing operations. Then, iris contours are represented by 40 ordered isometric sampling polar points and thus their corresponding offset vectors are regressed via a convolutional neural network for multiple times to obtain the precise inner and outer boundaries of the iris. Next, the predicted iris boundary results are regarded as a constraint to limit the segmentation range of noise-free iris mask. Besides, an efficient channel attention module is introduced in the mask prediction to make the network focus on the valid iris region. A differentiable, fast, and efficient SoftPool operation is also used in place of traditional pooling to keep more details for more accurate pixel classification. Finally, the proposed iris segmentation approach is combined with off-the-shelf iris feature extraction models including traditional OM and deep learning-based FeatNet for iris recognition. The experimental results on two NIR datasets CASIA-Iris-off-angle, CASIA-Iris-Africa, and a VIS dataset SBVPI show that the proposed approach achieves a significant performance improvement in the segmentation and recognition for both regular and off-angle iris images.

Learning Instance-level Spatial-Temporal Patterns for Person Re-identification

Learning Instance-level Spatial-Temporal Patterns for Person Re-identification

Person re-identification (Re-ID) aims to match pedestrians under dis-joint cameras. Most Re-ID methods formulate it as visual representation learning and image search, and its accuracy is consequently affected greatly by the search space. Spatial-temporal information has been proven to be efficient to filter irrelevant negative samples and significantly improve Re-ID accuracy. However, existing spatial-temporal person Re-ID methods are still rough and do not exploit spatial-temporal information sufficiently. In this paper, we propose a novel instance-level and spatial-temporal disentangled Re-ID method (InSTD), to improve Re-ID accuracy. In our proposed framework, personalized information such as moving direction is explicitly considered to further narrow down the search space. Besides, the spatial-temporal transferring probability is disentangled from joint distribution to marginal distribution, so that outliers can also be well modeled. Abundant experimental analyses on two datasets are presented, which demonstrates the superiority and provides more insights into our method. The proposed method achieves mAP of 90.8% on Market-1501 and 89.1% on DukeMTMC-reID, improving from the baseline 82.2% and 72.7%, respectively. Besides, in order to provide a better benchmark for person re-identification, we release a cleaned data list of DukeMTMC-reID with this paper: https://github.com/RenMin1991/cleaned-DukeMTMC-reID.

Cross-spectral Iris Recognition by Learning Device-specific Band

Cross-spectral Iris Recognition by Learning Device-specific Band

Cross-spectral recognition is still an open challenge in iris recognition. In cross-spectral iris recognition, there exist distinct device-specific bands between near-infrared (NIR) and visible (VIS) images, resulting in the distribution gap between samples from different spectra and thus severe degradation in recognition performance. To tackle this problem, we propose a new cross-spectral iris recognition method to learn spectral-invariant features by estimating device-specific bands. In the proposed method, Gabor Trident Network (GTN) first utilizes the Gabor function’s priors to perceive iris textures under different spectra, and then codes the device-specific band as the residual component to assist the generation of spectral-invariant features. By investigating the device-specific band, GTN effectively reduces the impact of device-specific bands on identity features. Besides, we make three efforts to further reduce the distribution gap. First, Spectral Adversarial Network (SAN) adopts a class-level adversarial strategy to align feature distributions. Second, Sample-Anchor (SA) loss upgrades triplet loss by pulling samples to their class center and pushing away from other class centers. Third, we develop a higher-order alignment loss to measures the distribution gap according to space bases and distribution shapes. Extensive experiments on five iris datasets demonstrate the efficacy of our proposed method for cross-spectral iris recognition.

Flexible Iris Matching Based on Spatial Feature Reconstruction

Flexible Iris Matching Based on Spatial Feature Reconstruction

In an iris recognition-at-a-distance (IAAD) system used in surveillance scenarios, the camera usually captures a large number of low-quality images. These images exhibit partial occlusions due to eyelids and eyelashes, specular reflections, and severe deformations caused by pupil dilations and contractions. Recognizing these low-quality images is a challenging yet dominant problem in IAAD. To mitigate this issue, current iris recognition systems mostly filter out low-quality images by using strict criteria based on image quality evaluation. This strategy, however, wastes device capabilities and produces low throughput of subjects. Other systems require highly cooperative users. In this work, we propose a novel occlusion-robust, deformation-robust, and alignment-free framework for low-quality iris matching, which integrates the merits of deep features and sparse representation in an end-to-end learning process known as iris spatial feature reconstruction (ISFR). Here each probe image can be sparsely reconstructed on the basis of appropriate feature maps from gallery high-quality images. ISFR uses the error from robust reconstruction over spatial pyramid features to measure similarities between two iris images, which naturally avoids the time-consuming alignment step. In summary, the distinctiveness of deep features, the robustness of sparse reconstruction, and the flexibility of multiscale matching strategy are unified in a general framework to attain more accurate and reasonable iris matching. Extensive experimental results on four public iris image databases demonstrate that the proposed method significantly outperforms both traditional and deep learning-based iris recognition methods.

CASIA-Face-Africa: A Large-Scale African Face Image Database

CASIA-Face-Africa: A Large-Scale African Face Image Database

Face recognition is a popular and well-studied area with wide applications in our society. However, racial bias had been proven to be inherent in most State Of The Art (SOTA) face recognition systems. Many investigative studies on face recognition algorithms have reported higher false positive rates of African subjects cohorts than the other cohorts. Lack of large-scale African face image databases in public domain is one of the main restrictions in studying the racial bias problem of face recognition. To this end, we collect a face image database namely CASIA-Face-Africa which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The performances of SOTA face recognition algorithms without re-training are reported as baselines. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc. The database can be downloaded from our website.

Contrastive Uncertainty Learning for Iris Recognition with Insufficient Labeled Samples

Contrastive Uncertainty Learning for Iris Recognition with Insufficient Labeled Samples

Cross-database recognition is still an unavoidable challenge when deploying an iris recognition system to a new environment. In the paper, we present a compromise problem that resembles the real-world scenario, named iris recognition with insufficient labeled samples. This new problem aims to improve the recognition performance by utilizing partially-or un-labeled data. To address the problem, we propose Contrastive Uncertainty Learning (CUL) by integrating the merits of uncertainty learning and contrastive self-supervised learning. CUL makes two efforts to learn a discriminative and robust feature representation. On the one hand, CUL explores the uncertain acquisition factors and adopts a probabilistic embedding to represent the iris image. In the probabilistic representation, the identity information and acquisition factors are disentangled into the mean and variance, avoiding the impact of uncertain acquisition factors on the identity information. On the other hand, CUL utilizes probabilistic embeddings to generate virtual positive and negative pairs. Then CUL builds its contrastive loss to group the similar samples closely and push the dissimilar samples apart. The experimental results demonstrate the effectiveness of the proposed CUL for iris recognition with insufficient labeled samples.

Overview of biometrics research

Overview of biometrics research

Biometrics, such as face, iris, and fingerprint recognition, have become digital identity proof for people to enter the “Internet of Everything” . For example, one may be asked to present the biometric identifier for unlocking mobile, passing access control at airports, rail stations, and paying at supermarkets or restaurants. Biometric recognition empowers a machine to automatically detect, capture, process, analyze, and recognize digital physiological or behavioral signals with advanced intelligence. Thus, biometrics requires interdisciplinary research of science and technology involving optical engineering, mechanical engineering, electronic engineering, machine learning, pattern recognition, computer vision, digital image processing, signal analysis, cognitive science, neuroscience, human-computer interaction, and information security. Biometrics is a typical and complex pattern recognition problem, which is a frontier research direction of artificial intelligence. In addition, biometric identification is a key development area of Chinese strategies, such as the Development Plan on the New Generation of Artificial Intelligence and the “Internet Plus” Action Plan. The development of biometric identification involves public interest, privacy, ethics, and law issues; thus, it has also attracted widespread attention from the society. This article systematically reviews the development status, emerging directions, existing problems, and feasible ideas of biometrics and comprehensively summarizes the research progress of face, iris, fingerprint, palm print, finger / palm vein, voiceprint, gait recognition, person reidentification, and multimodal biometric fusion. The overview of face recognition includes face detection, facial landmark localization, 2D face feature extraction and recognition, 3D face feature extraction and recognition, facial liveness detection, and face video based biological signal measurement. The overview of iris recognition includes iris image acquisition, iris segmentation and localization, iris liveness detection, iris image quality assessment, iris feature extraction, heterogeneous iris recognition, fusion of iris and other modalities, security problems of iris biometrics, and future trends of iris recognition. The overview of fingerprint recognition includes latent fingerprint recognition, fingerprint liveness detection, distorted fingerprint recognition, 3D fingerprint capturing, and challenges and trends of fingerprint biometrics. The overview of palm print recognition mainly introduces databases, feature models, matching strategies, and open problems of palm print biometrics. The overview of vein biometrics introduces main datasets and algorithms for finger vein, dorsal hand vein, and palm vein, and then points out the remaining unsolved problems and development trend of vein recognition. The overview of gait recognition introduces model-based and model-free methods for gait feature extraction and matching. The overview of person reidentification introduces research progress of new methods under supervised, unsupervised and weakly supervised conditions, gait database virtualization, generative gait models, and new problems, such as clothes changing, black clothes, and partial occlusions. The overview of voiceprint recognition introduces the history of speaker recognition, robustness of voiceprint, spoofing attacks, and antispoofing methods. The overview of multibiometrics introduces image-level, feature-level, score-level, and decision-level information fusion methods and deep learning based fusion approaches. Taking face as the exemplar biometric modality, new research directions that have received great attentions in the field of biometric recognition in recent years, i. e. , adversarial attack and defense as well as Deepfake and anti-Deepfake, are also introduced. Finally, we analyze and summarize the three major challenges in the field of biometric recognition——— “ the blind spot of biometric sensors”, “ the decision errors of biometric algorithms” and “the red zone of biometric security” . Therefore, the sensing, cognition, and security mechanisms of biometrics are necessary to achieve a fundamental breakthrough in the academic research and technologies applications of biometrics in complex scenarios to address the shortcomings of the existing biometric technologies and to move towards the overall goal of developing a new generation of “ perceptible, “ robust”, and “ trustworthy” biometric identification technology.

A Novel Deep-learning Pipeline for Light Field Image Based Material Recognition

A Novel Deep-learning Pipeline for Light Field Image Based Material Recognition

The primitive basis of image based material recognition builds upon the fact that discrepancies in the reflectances of distinct materials lead to imaging differences under multiple viewpoints. LF cameras possess coherent abilities to capture multiple sub-aperture views (SAIs) within one exposure, which can provide appropriate multi-view sources for material recognition. In this paper, a unified Factorize-Connect-Merge (FCM) deep-learning pipeline is proposed to solve problems of light field image based material recognition. 4D light-field data as input is initially decomposed into consecutive 3D light-field slices. Shallow CNN is leveraged to extract low-level visual features of each view inside these slices. As to establish correspondences between these SAIs, Bidirectional Long-Short Term Memory (Bi-LSTM) network is built upon these low-level features to model the imaging differences. After feature selection including concatenation and dimension reduction, effective and robust feature representations for material recognition can be extracted from 4D light-field data. Experimental results indicate that the proposed pipeline can obtain remarkable performances on both tasks of single-pixel material classification and whole-image material segmentation. In addition, the proposed pipeline can potentially benefit and inspire other researchers who may also take LF images as input and need to extract 4D light-field representations for computer vision tasks such as object classification, semantic segmentation and edge detection.

High-fidelity View Synthesis for Light Field Imaging with Extended Pseudo 4DCNN

High-fidelity View Synthesis for Light Field Imaging with Extended Pseudo 4DCNN

Multi-view properties of light field (LF) imaging enable exciting applications such as auto-refocusing, depth estimation and 3D reconstruction. However, limited angular resolution has become the main bottleneck of microlens-based plenoptic cameras towards more practical vision applications. Existing view synthesis methods mainly break the task into two steps, i.e. depth estimating and view warping, which are usually inefficient and produce artifacts over depth ambiguities. We have proposed an end-to-end deep learning framework named Pseudo 4DCNN to solve these problems in a conference paper. Rethinking on the overall paradigm, we further extend pseudo 4DCNN and propose a novel loss function which is applicable for all tasks of light field reconstruction i.e. EPI Structure Preserving (ESP) loss function. This loss function is proposed to attenuate the blurry edges and artifacts caused by averaging effect of L2 norm based loss function. Furthermore, the extended Pseudo 4DCNN is compared with recent state-of-the-art (SOTA) approaches on more publicly available light field databases, as well as self-captured light field biometrics and microscopy datasets. Experimental results demonstrate that the proposed framework can achieve better performances than vanilla Pseudo 4DCNN and other SOTA methods, especially in the terms of visual quality under occlusions. The source codes and self-collected datasets for reproducibility are available online.

Towards Complete and Accurate Iris Segmentation Using Deep Multi-Task Attention Network for Non-Cooperative Iris Recognition

Towards Complete and Accurate Iris Segmentation Using Deep Multi-Task Attention Network for Non-Cooperative Iris Recognition

Iris images captured in non-cooperative environments often suffer from adverse noise, which challenges many existing iris segmentation methods. To address this problem, this paper proposes a high-efficiency deep learning based iris segmentation approach, named IrisParseNet. Different from many previous CNN-based iris segmentation methods, which only focus on predicting accurate iris masks by following popular semantic segmentation frameworks, the proposed approach is a complete iris segmentation solution, i.e., iris mask and parameterized inner and outer iris boundaries are jointly achieved by actively modeling them into a unified multi-task network. Moreover, an elaborately designed attention module is incorporated into it to improve the segmentation performance. To train and evaluate the proposed approach, we manually label three representative and challenging iris databases, i.e., CASIA.v4-distance, UBIRIS.v2, and MICHE-I, which involve multiple illumination (NIR, VIS) and imaging sensors (long-range and mobile iris cameras), along with various types of noises. Additionally, several unified evaluation protocols are built for fair comparisons. Extensive experiments are conducted on these newly annotated databases, and results show that the proposed approach achieves state-of-the-art performance on various benchmarks. Further, as a general drop-in replacement, the proposed iris segmentation method can be used for any iris recognition methodology, and would significantly improve the performance of non-cooperative iris recognition.

Towards Complete and Accurate Iris Segmentation Using Deep Multi-Task Attention Network for Non-Cooperative Iris Recognition

Towards Complete and Accurate Iris Segmentation Using Deep Multi-Task Attention Network for Non-Cooperative Iris Recognition

Iris images captured in non-cooperative environments often suffer from adverse noise, which challenges many existing iris segmentation methods. To address this problem, this paper proposes a high-efficiency deep learning based iris segmentation approach, named IrisParseNet. Different from many previous CNN-based iris segmentation methods, which only focus on predicting accurate iris masks by following popular semantic segmentation frameworks, the proposed approach is a complete iris segmentation solution, i.e., iris mask and parameterized inner and outer iris boundaries are jointly achieved by actively modeling them into a unified multi-task network. Moreover, an elaborately designed attention module is incorporated into it to improve the segmentation performance. To train and evaluate the proposed approach, we manually label three representative and challenging iris databases, i.e., CASIA.v4-distance, UBIRIS.v2, and MICHE-I, which involve multiple illumination (NIR, VIS) and imaging sensors (long-range and mobile iris cameras), along with various types of noises. Additionally, several unified evaluation protocols are built for fair comparisons. Extensive experiments are conducted on these newly annotated databases, and results show that the proposed approach achieves state-of-the-art performance on various benchmarks. Further, as a general drop-in replacement, the proposed iris segmentation method can be used for any iris recognition methodology, and would significantly improve the performance of non-cooperative iris recognition.

Recognition Oriented Iris Image Quality Assessment in the Feature Space

Recognition Oriented Iris Image Quality Assessment in the Feature Space

A large portion of iris images captured in real world scenarios are poor quality due to the uncontrolled environment and the non-cooperative subject. To ensure that the recognition algorithm is not affected by low-quality images, traditional hand-crafted factors based methods discard most images, which will cause system timeout and disrupt user experience. In this paper, we propose a recognition-oriented quality metric and assessment method for iris image to deal with the problem. The method regards the iris image embeddings Distance in Feature Space (DFS) as the quality metric and the prediction is based on deep neural networks with the attention mechanism. The quality metric proposed in this paper can significantly improve the performance of the recognition algorithm while reducing the number of images discarded for recognition, which is advantageous over hand-crafted factors based iris quality assessment methods. The relationship between Image Rejection Rate (IRR) and Equal Error Rate (EER) is proposed to evaluate the performance of the quality assessment algorithm under the same image quality distribution and the same recognition algorithm. Compared with hand-crafted factors based methods, the proposed method is a trial to bridge the gap between the image quality assessment and biometric recognition.

Dynamic Graph Representation for Occlusion Handling in Biometrics

Dynamic Graph Representation for Occlusion Handling in Biometrics

The generalization ability of Convolutional neural networks (CNNs) for biometrics drops greatly due to the adverse effects of various occlusions. To this end, we propose a novel unified framework integrated the merits of both CNNs and graphical models to learn dynamic graph representations for occlusion problems in biometrics, called Dynamic Graph Representation (DGR). Convolutional features onto certain regions are re-crafted by a graph generator to establish the connections among the spatial parts of biometrics and build Feature Graphs based on these node representations. Each node of Feature Graphs corresponds to a specific part of the input image and the edges express the spatial relationships between parts. By analyzing the similarities between the nodes, the framework is able to adaptively remove the nodes representing the occluded parts. During dynamic graph matching, we propose a novel strategy to measure the distances of both nodes and adjacent matrixes. In this way, the proposed method is more convincing than CNNs-based methods because the dynamic graph method implies a more illustrative and reasonable inference of the biometrics decision. Experiments conducted on iris and face demonstrate the superiority of the proposed framework, which boosts the accuracy of occluded biometrics recognition by a large margin comparing with baseline methods.

Cross-sensor iris recognition using adversarial strategy and sensor-specific information

Cross-sensor iris recognition using adversarial strategy and sensor-specific information

Due to the growing demand of iris biometrics, lots of new sensors are being developed for high-quality image acquisition. However, upgrading the sensor and re-enrolling for users is expensive and time-consuming. This leads to a dilemma where enrolling on one type of sensor but recognizing on the others. For this cross-sensor matching, the large gap between distributions of enrolling and recognizing images usually results in degradation in recognition performance. To alleviate this degradation, we propose Cross-sensor iris network (CSIN) by applying the adversarial strategy and weakening interference of sensor-specific information. Specifically, there are three valuable efforts towards learning discriminative iris features. Firstly, the proposed CSIN adds extra feature extractors to generate residual components containing sensor-specific information and then utilizes these components to narrow the distribution gap. Secondly, an adversarial strategy is borrowed from Generative Adversarial Networks to align feature distributions and further reduce the discrepancy of images caused by sensors. Finally, we extend triplet loss and propose instance-anchor loss to pull the instances of the same class together and push away from others. It is worth mentioning that the proposed method doesn’t need pair-same data or triplet, which reduced the cost of data preparation. Experiments on two real-world datasets validate the effectiveness of the proposed method in cross-sensor iris recognition.

Alignment Free and Distortion Robust Iris Recognition

Alignment Free and Distortion Robust Iris Recognition

Iris recognition is a reliable personal identification method but there is still much room to improve its accuracy especially in less-constrained situations. For example, free movement of head pose may cause large rotation difference between iris images. And illumination variations may cause irregular distortion of iris texture. To match intra-class iris images with head rotation robustly, the existing soadminlutions usually need a precise alignment operation by exhaustive search within a determined range in iris image preprosessing or brute-force searching the minimum Hamming distance in iris feature matching. In the wild enviroments, iris rotation is of much greater uncertainty than that in constrained situations and exhaustive search within a determined range is impracticable. This paper presents a unified feature-level solution to both alignment free and distortion robust iris recognition in the wild. A new deep learning based method named Alignment Free Iris Network (AFINet) is proposed, which utilizes a trainable VLAD (Vector of Locally Aggregated Descriptors) encoder called NetVLAD [18] to decouple the correlations between local representations and their spatial positions. And deformable convolution [5] is leveraged to overcome iris texture distortion by dense adaptive sampling. The results of extensive experiments on three public iris image databases and the simulated degradation databases show that AFINet significantly outperforms state-of-art iris recognition methods.

LFNet: A Novel Bidirectional Recurrent Convolutional Neural Network for Light-Field Image Super-Resolution

LFNet: A Novel Bidirectional Recurrent Convolutional Neural Network for Light-Field Image Super-Resolution

The low spatial resolution of light-field image poses significant difficulties in exploiting its advantage. To mitigate the dependency of accurate depth or disparity information as priors for light-field image super-resolution, we propose an implicitly multi-scale fusion scheme to accumulate contextual information from multiple scales for super-resolution reconstruction. The implicitly multi-scale fusion scheme is then incorporated into bidirectional recurrent convolutional neural network, which aims to iteratively model spatial relations between horizontally or vertically adjacent sub-aperture images of light-field data. Within the network, the recurrent convolutions are modified to be more effective and flexible in modeling the spatial correlations between neighboring views. A horizontal sub-network and a vertical sub-network of the same network structure are ensembled for final outputs via stacked generalization. Experimental results on synthetic and real-world data sets demonstrate that the proposed method outperforms other state-of-the-art methods by a large margin in peak signal-to-noise ratio and gray-scale structural similarity indexes, which also achieves superior quality for human visual systems. Furthermore, the proposed method can enhance the performance of light field applications such as depth estimation.

Hierarchical Multi-class Iris Classification for Liveness Detection

Hierarchical Multi-class Iris Classification for Liveness Detection

In modern society, iris recognition has become increasingly popular. The security risk of iris recognition is increasing rapidly because of the attack by various patterns of fake iris. A German hacker organization called Chaos Computer Club cracked the iris recognition system of Samsung Galaxy S8 recently. In view of these risks, iris liveness detection has shown its significant importance to iris recognition systems. The state-of-the-art algorithms mainly rely on hand-crafted texture features which can only identify fake iris images with single pattern. In this paper, we proposed a Hierarchical Multiclass Iris Classification (HMC) for liveness detection based on CNN. HMC mainly focuses on iris liveness detection of multipattern fake iris. The proposed method learns the features of different fake iris patterns by CNN and classifies the genuine or fake iris images by hierarchical multi-class classification. This classification takes various characteristics of different fake iris patterns into account. All kinds of fake iris patterns are divided into two categories by their fake areas. The process is designed as two steps to identify two categories of fake iris images respectively. Experimental results demonstrate an extremely higher accuracy of iris liveness detection than other state-of-the-art algorithms. The proposed HMC remarkably achieves the best results with nearly 100% accuracy on NDContact, CASIA-Iris-Interval, CASIA-Iris-Syn and LivDetIris-2017-Warsaw datasets. The method also achieves the best results with 100% accuracy on a hybrid dataset which consists of ND-Contact and LivDet-Iris-2017-Warsaw dataset

A simple and robust super resolution method for light field images

A simple and robust super resolution method for light field images